## Introduction/Business Problem

### Problem Background
Canada is a popular location for exchange students, between 2014 and 2017 there was  an increase of 34% in international students. Most of these international students were in Toronto, Vancouver and Montreal. The continuous increasing rate of foreign students requires a higher housing amount.

Finding a place to live is usually a hard task for foreign students as they have to take into account many aspects such as living expenses, amenities, transportation, among others. 

### Problem Description
When a company is trying to build apartments for students in Toronto, is important to look for the right location, that will make students choose them. In order to do that is important to consider:
    - public transport
    - shopping and eating 
    - distance to universities

### Target Audience
Stakeholders of a construction project are interested on making a good investment that can allow them to choose the right neighborhood in Toronto in order to rent all the apartments in a short period of time.

### Data Description
The data that will be used for the problem is a list of postal codes in Toronto. This data will be get from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and turned into a dataframe for further analysis.

After getting the information from the webpage previously mentioned the Foursquare location data will be used to look for the neighborhood with better benefits.

In [7]:
# import necessary libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


In [8]:
r=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup=BeautifulSoup(r,"html.parser")
tab = soup.find('table')
tab.text
ro = tab.find_all('tr')    # get how many rows are in the table
nrows = len(ro)
nrows

181

In [12]:
title = tab.tr.text # extract header of table from webpage
title = title.split('\n')
h1 = title[1]
h2 = title[3]
h3 = title[5]
print("Header 1:", h1, "\nHeader 2:", h2, "\nHeader 3:",h3)

Header 1: Postal Code 
Header 2: Borough 
Header 3: Neighborhood


In [13]:
data = []
for i in range(1,nrows):
    data_PC = ro[i].text.split('\n')[1]   # Get Postal Code data
    data_B = ro[i].text.split('\n')[3]    # Get Borough data
    data_N = ro[i].text.split('\n')[5]    # Get Neighborhood  data
    data_N = data_N.replace('/',',')      # Get all neighborhoods separated by ,
    data_N = data_N.replace('Not assigned',data_B)  # Neighborhoods with Not asigned are change to the same name as the borough
    if data_B != 'Not assigned':          # Cells with not assigned boroughs are ignored
        data.append((data_PC, data_B, data_N))
         

In [14]:
df = pd.DataFrame(data, columns = [h1,h2,h3])    # Transform list into dataframe
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [15]:
print(df.shape)

(103, 3)


We can see that there are 103 different postal codes in Toronto, in order to reduce the amount of information only Toronto's boroughs are going to be consider.