# The Battle of the Neighborhoods - Week 1

## Introduction: Business Problem

### Background

Toronto is the biggest city in Canada. It is the provincial capital of Ontario and the most populous city, with a population of 2,731,571 in 2016. Toronto is a truly internal business center in North America and financial capital of Canada. 

Toronto is also the largest centre of education, research and innovation in Canada. The Toronto education system combines the Public and Privates schools for both elementary and secondary, All schools take their curricular mandate from the Ontario Ministry of Education.

There are four types of school boards in Ontario. Depends on individual student language, religious background or choice, students can attend English Public, English Catholic, French Public, or French Catholic schools.

Publicly funded education is divided into three stages: early childhood education, for children from birth to age 6; elementary school, for students from kindergarten to grade 8; and secondary school, for students from grade 9 to 12.

### Business Problem

##### What is EQAO
- The Education Quality and Accountability Office (EQAO) is an an independent government agency of the Government of Ontario. The purpose of EQAO is develops and oversees reading, writing and mathematics tests that Ontario students must take in Grades 3, 6, 9, and 10.
- The EQAO test results give parents, teachers, principals and school boards information about how well students have learned what the province expects them to learn in reading, writing and mathematics.

##### EQAO results
- Only half of Ontario's Grade 6 students met the provincial standards for math in the 2016-2017 academic year, down seven percentage points from 2013. Meanwhile, 62 per cent of Grade 3 students met the provincial math standards, a decrease of five percentage points from 2014.
- For Grade 9 students, only 44 per cent met the standard in the applied math in 2017-2018. That number experienced a decline compare with years in 2013-2014.

##### Impacts
- After the release of results from 2017-2018 year’s EQAO standardized testing, The Ontario Government announced a four-year math strategy. 
- Ontario will spend more than $55 million this year hiring math learning leads for school boards, providing “extensive” training in elementary and secondary schools, and expanding other programs like tutoring.
- A public concern has been raised regarding the 2017-2018 EQAO results. There is growing number of students using or searching for private tutor services. https://www.cbc.ca/news/canada/toronto/ontario-math-curriculum-private-services-1.4445472

### Business Opportunity

Consider to open a after-school tutor service in Toronto? According to the Wall Street Journal’s Smart Money Magazine, now, it could be the perfect time for you to get into the education business.

Let's go to explore data we collect from multipled data sources and arrange them as a dataframe for the analysis; so that we can target the recommended locations across different areas according to what we discover from the collected data.

## Data Description

Data collection and process in most cases require up to 80% time in the whole Data Science project. How data is gathered and analyzed depends on many factors. These factors are including the content, the problems or issues can be  identified with some indicators, the datasource integrity, and the size of data. 

There are some aspects should be considered in the data collection for this project. 

- The schools number in a neiboroughood: If the area has numbers of opening schools, particularly those are public schools, the higher demanding needs for tutoring services
- The school ranking: if the school has lower ranking, then the number of students are looking for tutors services for academy improvement is higher.
- The number of tutor services: to avoid the competition and towards to more successful in business, the area has no or few tutor services business opening could have an opportunity to open one.

There are at least 3 datasource required for this project to provide data analysis and suggestion for business decision.

- Toronto schools Data : This can be collected from Toronto city open dataset, https://www.toronto.ca/city-government/data-research-maps/open-data/. This data provides the number of schools currently opening in Toronto; school name, and addresses.

- Toronto neiboroughood data: This geo data is from Wiki and foursqure API access, we can use it to analyze the school geo location for the potential location to open a business.

- Toronto Schools ratings: The school ranking is the key aspect in this project. Although parents would choose re-locate for a better school for their children, however, it is a time-consuming and stressful process. The ranking data is yearly updated and can be found in the Fraser Institute web site https://www.fraserinstitute.org/school-performance.

## Data preparation

In [35]:
# Import libraries
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt # plotting library

# Data collection
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

# Map
!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


#### Get Toronto Neiboroughood data

In [38]:
# Collecting toronto neighborhood data
content = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
# parse data from the html into a beautifulsoup object
data = BeautifulSoup(content, 'html.parser')

In [39]:
# Process data
postalCodeList = []
boroughList = []
neighborhoodList = []

In [40]:
# Loop through table content; store Postal Code, Borough, and Neighborhood data into each list
# <tr><td>M9B</td><td><a href="/wiki/Etobicoke" title="Etobicoke">Etobicoke</a></td><td><a class="mw-redirect" href="/wiki/Islington,_Toronto" title="Islington, Toronto">Islington</a></td></tr>
for row in data.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        postalCodeList.append(cells[0].text)
        boroughList.append(cells[1].text)
        neighborhoodList.append(cells[2].text.rstrip('\n'))

In [41]:
# Define a dataframe consist data of three columns: PostalCode, Borough, and Neighborhood
df_toronto = pd.DataFrame({"PostalCode": postalCodeList,
                           "Borough": boroughList,
                           "Neighborhood": neighborhoodList})

df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [42]:
# Ignore cells with a borough that is Not assigned.
df_toronto_dropna = df_toronto[df_toronto.Borough != "Not assigned"].reset_index(drop=True)
#df_toronto_dropna.head(10)

# Group neighborhoods that are in the same boroug
df_toronto_grouped = df_toronto_dropna.groupby(["PostalCode", "Borough"], as_index=False).agg(lambda x: ", ".join(x))
df_toronto_grouped.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [43]:
# For those "Not assigned" neighborhood, fill neighborhood with same name as the borough
for index, row in df_toronto_grouped.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]
        
df_toronto_grouped.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [44]:
# Print the number rows of dataframe
df_toronto_grouped.shape

(103, 3)

In [46]:
# Load cvs file that has the geographical coordinates of each postal code
coordinates = pd.read_csv("Geospatial_Coordinates.csv")
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [47]:
# Make column name consistent
coordinates.rename(columns={"Postal Code": "PostalCode"}, inplace=True)

In [48]:
# Merge two dataframes; groupd by postal code
df_toronto_coordinates = df_toronto_grouped.merge(coordinates, on="PostalCode", how="left")
df_toronto_coordinates.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [49]:
# Laverage geopy library to get the latitude and longitude values of Toronto
address = 'Toronto, Ontario'
# Define geo user agent
geolocator = Nominatim(user_agent="toronto_explorer")

# get the geographical coordinates of Toronto
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [50]:
neighborhood_latitude = latitude
neighborhood_longitude = longitude

#print(neighborhood_latitude)
#print(neighborhood_longitude)

In [53]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

In [54]:
# add markers to map, use dataframe df_toronto_coordinates from part2
for lat, lng, borough, neighborhood in zip(df_toronto_coordinates['Latitude'], df_toronto_coordinates['Longitude'], df_toronto_coordinates['Borough'], df_toronto_coordinates['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Below activities will be completed in Week2

#### Leverage Foursquare API to explore the neighborhoods

In [57]:
# GET request and examine the response

#### Leverage some functions learned from course

In [59]:
# functions

#### Get Toronto Schools data

In [60]:
# Load cvs file that has the Toronto schools data for all type
#schools = pd.read_csv("Toronto-School-locations-all-types.csv")
#schools.head()

In [61]:
# Process schools data

#### Get schools ranking data

In [62]:
# Load cvs file from Fraser Institute that has the Toronto schools ranking info

In [63]:
# process school ranking data