# Applied Data Science Capstone Project

## With interest rates at an all time low, there are many buyers looking for houses, including many young adults who will be first time home owners who may not be familiar with the area. The purpose of this project is to provide homeowners with information on different areas, such as points of interest that are in the neighborhoods.  I will be primarily be focusing on a young couple, in the age of 25-35, looking for their first home in an area that is affordable, has neighbors in that age range, and has attractions and things to do nearby. This project will be focused in the Toronto area.

### Importing necessary packages, Reading in Postal Code data from Wikipedia of the Tornto area

In [None]:
## Install necessary packages
# pip install geopandas # To install the package
## !conda install -c conda-forge folium=0.5.0 --yes

import pandas as pd
from pandas import DataFrame

import geopandas
import matplotlib.pyplot as plt

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

print ("done importing packages")


In [None]:
## Read the data
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
dfs = pd.read_html(url)
# Get table                                                                                                           
df = dfs[0]


In [None]:
df.shape

In [5]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [6]:
df_copy= df

In [7]:
df_copy.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


#### Creating a cleaned DF copy that will not include Bouroughs that are Not assigned

In [8]:
df_cleaned = df_copy[df_copy['Borough'] != "Not assigned"]

In [9]:
df_cleaned.head(20)

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [10]:
df_cleaned.shape

(103, 3)

### Using a CSV to get coordinates of the postal codes in Toronto since Geocoder did not work

In [None]:
pip install geocoder

In [None]:
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format("Postal Code"))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

In [11]:
geo_data = "https://cocl.us/Geospatial_data"
geo_df= pd.read_csv(geo_data)

In [12]:
print(geo_df.head())
print(geo_df.shape)

  Postal Code   Latitude  Longitude
0         M1B  43.806686 -79.194353
1         M1C  43.784535 -79.160497
2         M1E  43.763573 -79.188711
3         M1G  43.770992 -79.216917
4         M1H  43.773136 -79.239476
(103, 3)


### Final Dataframe- merged_df

In [13]:
merged_df= pd.merge(df_cleaned, geo_df, on= 'Postal Code')

In [14]:
merged_df.shape

(103, 5)

In [15]:
merged_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


### Graphs


In [28]:
merged_df['Longitude']= merged_df['Longitude'].astype(str)
merged_df['Latitude']= merged_df['Latitude'].astype(str)

In [33]:
merged_df['Coordinates'] = merged_df['Latitude'].str.cat(merged_df['Longitude'],sep=", ")

In [34]:
merged_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Coordinates
0,M3A,North York,Parkwoods,43.7532586,-79.3296565,"43.7532586, -79.3296565"
1,M4A,North York,Victoria Village,43.7258823,-79.31557159999998,"43.725882299999995, -79.31557159999998"
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6542599,-79.3606359,"43.6542599, -79.3606359"
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.46476329999999,"43.718517999999996, -79.46476329999999"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623015,-79.3894938,"43.6623015, -79.3894938"


### Graph of Coordinates