# Real-Estate Investment Analysis of Toronto Neighbourhoods

### Applied Data Science Capstone Project  - IBM 

## Introduction 

**Real Estate Investment** involves purchasing, managing, renting and selling real or physical property, for a profit.
Such an investment is characterised by a large amount of capital, and involves careful planning and decision making, for it to be a successful one.  

A **Neighbourhood-wise Analysis** of **Residential Real Estate** in the **140 Neighbourhoods of Toronto**, to determine which Neighbourhood would potentially offer a good return on investment, is the *goal of this project*.

**Toronto** is the largest and most populous city in Canada, known for being *diverse, multicultural and home to world class-amenities*, making it a great place to invest in a house!

When investing in a property, an overall market analysis will provide a good idea of the trends , but will not suffice. It is imperative to buy property in the **right neighbourhood**, because even if the overall market is great, a wrong location may lead to decreased property value in the future.

A Neighbourhood Analysis will reveal the **investment potential** of various neighbourhoods, based on their characteristics.

### Target Audience

This Analysis is targetted towards Companies & Individuals who are interested in investing residential property in Toronto. Whether it be a **real estate firm** looking for the apt location to start a new project, or **individuals** who wish to invest in a home in Toronto for living or simply renting out their property, this analysis will provide a comprehensive understanding of the most suitable neighbourhood according to ones needs.

Moreover, a Neighbourhood analysis not only reveals the investment potential for Residential Real Estate, but can also be applied for any other type of real estate!




## Data

#### Data pertaining to the following factors is used in the Neighbourhood Analysis:

### **Location** : 

This is the most **important** factor in Real Estate Investment. A **good location** has **close proximity** to transport, schools, recreation, shopping, employment and various other Amenities. 

**Amenities** are enhancing features, which benefit a location, contribute to its enjoyment and *increase its value*.  Amenities include parks,hospitals,restaurants,gyms etc.

The Location data will be collected using the **explore endpoint** of the  **Foursquare API**.
                 



### **Demographics** : 

The **Population, Age, Educational Attainment, Crime Rate and Median Income Levels** of a Neighbourhood, have a great impact on the *value and investment potential* of the Houses in it.

Therefore, it's essential to analyze the Demographics of neighbourhoods, and rank them accordingly.

The Demographic data is taken from the **'Neighbourhood Profiles'** Dataset of the [City of Toronto’s Open                             Data Portal](https://open.toronto.ca/dataset/neighbourhood-profiles/).

  
  



### **Real Estate Market** : 

Real Estate Data such as: 

* House Values
* Structure Type 
* Number of Bedrooms
* Year of Construction
* Rental Rates
* Total Sales and Sales Volume
* Sales Price to List Price Ratio
* Average Days on Market

   are factors that influence the value of a property.  
     
The real estate data is collected from the **Community Reports** and **Market Watch** of                                       [TRREB(Toronto Real Estate Board)](https://trreb.ca/index.php/market-news/community-reports) &                         [Realosophy](https://www.realosophy.com/toronto/neighbourhood-map).


                      
### **Neighbourhood Maps**


For visualizing the data geospatially, the **Folium** library is used. The **Coordinates** of each neighbourhood in Toronto is taken from [Geodatos](https://www.geodatos.net/en/coordinates), as it provided very accurate coordinates when compared to the Geopy Library. The GeoJson for mapping the boundaries of neighbourhoods is taken from City of Toronto’s Open Data Portal.                                                



## Lets have a look at the Data

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)
import json
import requests
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim 
import folium
from folium import GeoJson

## Demographic and Real Estate Data

#### The *Demographic* and *Real Estate* Data have been cleaned by me and saved to a new csv file, which is provided my Git repository.

In [3]:
data=pd.read_csv('TorontoREdata.csv')
data.drop('Unnamed: 0', axis=1, inplace=True)
data.head()

Unnamed: 0,Neighbourhood,"Population,2016",Population Growth,Children (0-14 years),Youth (15-24 years),Working Age(25-64 years),Seniors (65+ years),High School,Bachelors Degree,Masters degree,Doctorate,Median Household Income,Employment Rate %,Umemployment Rate %,Total Crime,Crime Rate,Detached house,Semi-Detached house,Apartment/Condo,Other attached Dwelling,Row house,Owned House,Rented House,1 Bedroom,2 Bedrooms,3 Bedrooms,4+ Bedrooms,Constructed<1960,Constructed 1961-80,Constructed 1991-2000,Constructed 2001-05,Constructed 2006-10,Constructed 2011-16,Home Sales,Sales Volume($),Average Price,Median Price,New Listings,Active Listings,Avg SP/LP,AvgDaysonMarket,Median Rent($)
0,Agincourt North,29113,-3.9,3840,3705,15535,6970,4240,3090,735,85,64686,50.0,9.8,160,5.495827,3345,805,3500,3660,1440,7395,1720,965,1790,2860,3490,245,4905,895,720,350,45,68,50774888,746690,766400,102,30,1.06,13,1750
1,Agincourt South-Malvern West,23757,8.0,3075,3360,13230,4660,3410,3270,920,85,61992,53.2,9.8,255,10.733678,2790,330,4450,2265,515,5890,2250,915,2235,2560,2405,835,3275,640,170,370,1200,120,83300152,694168,560000,207,69,1.02,19,1900
2,Alderwood,12054,1.3,1760,1235,7045,2335,1825,1415,390,25,83249,62.4,6.1,103,8.544881,2840,545,1145,1525,85,3675,950,325,1145,2055,1090,2905,1030,165,75,115,120,64,74920500,1170633,1060000,103,33,1.01,20,2425
3,Annex,30526,4.6,2360,3750,18520,6950,2060,6855,3930,870,71053,65.8,6.7,455,14.905327,645,1185,13415,7215,595,6060,9870,6995,4555,2000,1540,6920,4080,950,715,840,940,100,175689955,1756900,1288500,401,221,0.99,24,4200
4,Banbury-Don Mills,27695,2.9,3605,2730,14365,8615,2265,4895,1970,245,77547,55.6,7.2,180,6.499368,3485,285,7625,2370,740,7390,4735,3010,4245,2545,2280,2970,3675,1185,950,990,540,121,155776209,1287407,870000,256,98,0.98,21,3750


### A brief description of the dataset is given below:

The columns from **'Population,2016'** to **'Crime Rate'** are the **Demographic Features**, which include:

* **Population,2016**: Total population of each neighbourhood as per the 2016 Census
* **Population Growth**: Percentage growth of population in each neighbourhood
* **Age groups( Children - Seniors )** residing in each neighbourhood
* **Educational Attainment ( High School - Doctorate )** of the neighbourhood population
* **Median Household Income**: This is the Median income **before Tax**
* **Employment and Unemployment Rates**
* **Total Crime & Crime Rate**: Crime Rate is expressed as crime **per 1,000** people (Total Crime/Population * 1000)

#### A Neighbourhood with good Population growth, high Educational attainment, greater Median income, high employment rate, low unemployment rate, low crime rate, is generally considered to be an IDEAL Neighbourhood.

The columns from **'Detached House'** to **'Median Rent($)'** are the **Real Estate Data**, which include:

* **Structure** ( Detached House - Row house) : It shows the number of houses which are detached, semi-detached, Apartments/Condos, other attached dwellings and row houses , per neighbourhood
* **Owned vs Rented** : Number of houses which are owner occupied and renter occupied, per neighbourhood
* **Number of Bedrooms** : Number of houses with 1, 2, 3, 4 or more bedrooms
* **Year of Construction**: Number of houses in different years of construction, starting from pre-1960s' to 2016
* **Home Sales**: Number of homes sold per neighbourhood 
* **Sales Volume**: Total Sales Volume in dollars, per neighbourhood
* **Average Price**: Average home price per neighbourhood
* **Median Price**: Median home price per neighbourhood
* **New Listings**: Number of new sale listings in the market for each neighbourhood
* **Active Listings**: Number of listings active in the market for each neighbourhood
* **Average SP/LP**: This is the **Average Sales price to List Price Ratio**. It is a very important indicator in real estate investment analysis. It is the **Final Sale Price/Last List Price * 100** . 
* **Average Days on Market**: It measures the number of days a listing has been on the market, until sold.
* **Median Rent($)**: The median rent for a house per neighbourhood


#### A Neighbourhood with greater home sales and sale volume, high average SP/LP, low Average days on market, indicates that the properties in the neighbourhood are High in demand and selling quickly, and are favourable among investors.


## Neighbourhood Coordinates

#### The coordinates of the neighbourhoods are in a csv file, also available in my Git repository

In [11]:
coordinates= pd.read_csv('newcoor.csv')
coordinates.head()

Unnamed: 0.1,Unnamed: 0,Latitude,Longitude,Neighbourhood
0,0,43.808038,-79.266439,"Agincourt North,Toronto,ON"
1,1,43.78866,-79.26561,"Agincourt South-Malvern West,Toronto,ON"
2,2,43.601717,-79.545233,"Alderwood,Toronto,ON"
3,3,43.670338,-79.407117,"Annex,Toronto,ON"
4,4,43.73766,-79.34972,"Banbury-Don Mills,Toronto,ON"


In [12]:
coordinates.drop('Unnamed: 0',axis=1,inplace=True)
coordinates.dtypes

Latitude         float64
Longitude         object
Neighbourhood     object
dtype: object

In [13]:
coordinates['Longitude']=coordinates['Longitude'].astype(str).astype(float)
coordinates['Neighbourhoods']=data['Neighbourhood']
coordinates.drop('Neighbourhood',axis=1,inplace=True)
coordinates.head()

Unnamed: 0,Latitude,Longitude,Neighbourhoods
0,43.808038,-79.266439,Agincourt North
1,43.78866,-79.26561,Agincourt South-Malvern West
2,43.601717,-79.545233,Alderwood
3,43.670338,-79.407117,Annex
4,43.73766,-79.34972,Banbury-Don Mills


## Let's get the coordinates of Toronto and visualize the 140 neighbourhoods, along with their boundaries with the Folium Library

In [7]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="CA_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


In [14]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11) 

geo=r'Neighbourhoods.json' 
file = open(geo, encoding="utf8")
text = file.read()

GeoJson(text).add_to(map_toronto) #adding boundaries to the map using a GeoJson file

#adding markers to the neighbourhoods

for lat, lng, neighborhood in zip(coordinates['Latitude'],coordinates['Longitude'],coordinates['Neighbourhoods']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto