# Relationships between Toronto apartment rental price, safety and its neighborhoods clustering

## 1. Introduction

### 1.1 Background
Being one of the biggest cities in Canada, Toronto is welcoming a great amount of people from all over the world to visit, work and study, and its population is expected to grown to 3,560,000 by 2031, with an annual average growth of 41,000 [1]. Thus lead Toronto’s rental market become quite competitive. For instance, a 1 bedroom average apartment rent is around $1,270 in 2019, which has increased 23% comparing with 2013 [2]. For those who is about to settle down in Toronto for the first time, renting a solid apartment sounds like the first thing to do. Though not knowing much about this city, new lessee would still love to find a safe district, better with convenient neighborhood and of course under their budget. 

### 1.2 Business Problem
In this project, rental price, regional safety and neighborhood equipment will be analyzed for newcomers to find an ideal place efficiently in Toronto. Previous to this capstone project, we have already clustered Toronto’s neighborhoods. For further steps, We will explore: Which part of the Toronto has less criminal risks, and what are their expecting rental price? Does higher rental price guarantee to be safer and vice versa? And what are a high quality community nice-to-have, speaking of its neighborhoods equipment. Finally, what are the recommendations regarding to a student, a middle class and a retired man who is striving to find an apartment to rent in Toronto.

### 1.3 Expecting Purposes
The result of the project is expected to be helpful on multiple purposes: Mostly for people who need to rent an apartment and can have a roughly understanding about Toronto house rental market. Meanwhile, real estate agent can also use the result to have a clearer vision of the advantages and disadvantages of their properties. In addition, investors will know what are a posh community’s neighborhoods be like in Toronto, and to find out which community is currently short of the essential equipment, which brings out potential business opportunities.

## 2 Data Decription
Two datasets are used in this project: 
Killed or Seriously Injured (KSI) Toronto Clean: https://www.kaggle.com/jrmistry/killed-or-seriously-injured-ksi-toronto-clean/kernels
and 
Toronto Apartment Rental Price:https://www.kaggle.com/rajacsp/toronto-apartment-price 
all from kaggle.com. 



### 2.1 Rental Price Dataset
The shape of rental price data originally consists of: bedroom number, bathroom number, living room(den), address, Latitude, Longitude and rental price.

In [4]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
rental = pd.read_csv('Toronto_apartment_rentals_2018.csv')
rental.head()

Unnamed: 0,Bedroom,Bathroom,Den,Address,Lat,Long,Price
0,2,2.0,0,"3985 Grand Park Drive, 3985 Grand Park Dr, Mis...",43.581639,-79.648193,"$2,450.00"
1,1,1.0,1,"361 Front St W, Toronto, ON M5V 3R5, Canada",43.643051,-79.391643,"$2,150.00"
2,1,1.0,0,"89 McGill Street, Toronto, ON, M5B 0B1",43.660605,-79.378635,"$1,950.00"
3,2,2.0,0,"10 York Street, Toronto, ON, M5J 0E1",43.641087,-79.381405,"$2,900.00"
4,1,1.0,0,"80 St Patrick St, Toronto, ON M5T 2X6, Canada",43.652487,-79.389622,"$1,800.00"


Since rental price varies according to apartment size, it’s hard to justify and cluster one bedroom, two bedroom altogether. We find 1 bedroom without living room takes the greatest part of the dataset. To make a reasonable comparison and clustering, in this case only 1 bedroom with no den are focused.

In [5]:
rental.groupby('Bedroom').count()  
# 1 Bedroom takes most part

Unnamed: 0_level_0,Bathroom,Den,Address,Lat,Long,Price
Bedroom,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,749,749,749,749,749,749
2,334,334,334,334,334,334
3,41,41,41,41,41,41


In [11]:
rentalOne = rental.loc[rental['Bedroom'] == 1]
rentalOne.groupby('Den').count()
# 0 Den takes most part among 1 bedroom data

Unnamed: 0_level_0,Bedroom,Bathroom,Address,Lat,Long,Price
Den,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,577,577,577,577,577,577
1,172,172,172,172,172,172


### Rental Dataset pre-processed

In [12]:
rentalOne = rentalOne.loc[rentalOne['Den'] == 0 ]
rentalOne = rentalOne.drop(columns=['Address','Bathroom','Den'])
rentalOne.columns = ['Bedroom','Latitude', 'Longitude','Price']
rentalOne.head()

Unnamed: 0,Bedroom,Latitude,Longitude,Price
2,1,43.660605,-79.378635,"$1,950.00"
4,1,43.652487,-79.389622,"$1,800.00"
5,1,43.63489,-79.434654,"$1,729.00"
7,1,43.640918,-79.393982,"$1,900.00"
8,1,43.641308,-79.400093,"$1,900.00"


### 2.2 Killed or Seriously Injured (KSI) data
This dataset contains: accident number, accident year/month/day, latitude, longitude, type of the crime, result and the scene description

In [14]:
criminal = pd.read_csv('KSI_CLEAN.csv')
criminal.head()

Unnamed: 0,ACCNUM,YEAR,MONTH,DAY,HOUR,MINUTES,WEEKDAY,LATITUDE,LONGITUDE,Ward_Name,...,TRUCK,TRSN_CITY_VEH,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,FATAL
0,1249781,2011,8,4,23,18,3,43.651545,-79.38349,Toronto Centre-Rosedale (27),...,0,1,0,0,0,0,0,0,0,0
1,1311542,2012,8,19,23,18,6,43.780445,-79.30049,Scarborough-Agincourt (40),...,0,0,0,1,1,1,0,0,0,0
2,5002235651,2015,12,30,23,39,2,43.682342,-79.328266,Toronto-Danforth (30),...,0,0,0,0,0,1,0,0,0,1
3,1311542,2012,8,19,23,18,6,43.780445,-79.30049,Scarborough-Agincourt (40),...,0,0,0,1,1,1,0,0,0,0
4,1311542,2012,8,19,23,18,6,43.780445,-79.30049,Scarborough-Agincourt (40),...,0,0,0,1,1,1,0,0,0,0


### KSI data pre-processed

In this project, accident frequencies are more interested, regardless of year and type of crime. To clean up the dataset, we collect columns including accident number, hood name,latitude, longitude for further study.

In [15]:
criminal = criminal[['ACCNUM','Hood_Name','LATITUDE','LONGITUDE']]
criminal.head()

Unnamed: 0,ACCNUM,Hood_Name,LATITUDE,LONGITUDE
0,1249781,Bay Street Corridor (76),43.651545,-79.38349
1,1311542,Tam O'Shanter-Sullivan (118),43.780445,-79.30049
2,5002235651,Greenwood-Coxwell (65),43.682342,-79.328266
3,1311542,Tam O'Shanter-Sullivan (118),43.780445,-79.30049
4,1311542,Tam O'Shanter-Sullivan (118),43.780445,-79.30049


### Reference
### [1]Toronto polulation report https://www.toronto.ca/legdocs/mmis/2019/ph/bgrd/backgroundfile-124480.pdf
### [2]Toronto average rent price https://www.toronto.ca/community-people/community-partners/social-housing-providers/affordable-housing-operators/current-city-of-toronto-average-market-rents-and-utility-allowances/