# Segmenting and Clustering Neighborhoods in Toronto City

## 1. Setup

### 1.1 Import Libraries

In [42]:
import pandas as pd
import io
import requests

### 1.2 Notebook Settings

In [43]:
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

## 2. Getting Data

All data from Toronto neighborhoods should be web scrapped from [Wikipedia link](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) for this purpose will be used [BeatifulSoup](http://beautiful-soup-4.readthedocs.io/en/latest/) (Python library for web scrapping) this task was perfomed at this [notebook](https://github.com/felipetestaaa/coursera_capstone/blob/master/week3_clustering_neighborhoods_toronto_part1.ipynb)

### 2.1 Getting Data from past notebook

In [46]:
df_postcode = pd.read_csv('data_toronto_part1.csv')
df_postcode.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### 2.2 Getting Latitude and Longitude

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

It's supposed to use the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, this is a paid service API. So, we will use the [Geocoder]( https://geocoder.readthedocs.io/index.html.) Python package instead: 

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code.

Given that this package can be very unreliable, the latitude and longitude was imported using a csv file the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

In [47]:
url="http://cocl.us/Geospatial_data"
data = requests.get(url).content
df_lat_long = pd.read_csv(io.StringIO(data.decode('utf-8')))
df_lat_long = df_lat_long.rename(columns={'Postal Code': 'Postcode'})
df_lat_long.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### 2.3 Merging data

In [48]:
df = pd.merge(df_postcode, df_lat_long, on=['Postcode'], how='left')
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [50]:
df.shape

(103, 5)

## 3. Exporting Data

In [51]:
df.to_csv('data_toronto_part2.csv',index=False)