# Geographical location data of Neighborhoods

## A dataframe will be created where location data (latitude and longitude) will be provided along with postal code, borough and neighborhood of Canada

In [1]:
import pandas as pd

In [2]:
# reading from wikipedia page through pandas
canada_df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
canada_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


The data has to be cleaned, and rows with _Not Assigned_ entries in __both__ 'Neighborhood' and 'Borough' columns has to be truncated.

In [4]:
# cleaning data

# getting rid of rows which have 'Not assigned' value in both 'Borough' and 'Neighborhood' columns
canada_df_filtered = canada_df[(canada_df.Borough != 'Not assigned') & (canada_df.Neighborhood != 'Not assigned')]

# resetting index as rows were dropped
canada_df_filtered = canada_df_filtered.reset_index()

# truncating the old index column
canada_df_filtered = canada_df_filtered.drop(columns=['index'])

In [6]:
canada_df_filtered.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [7]:
canada_df = canada_df_filtered

## Reading the location data to a Pandas dataframe

Location data is downloaded from [link provided by Coursera](https://cocl.us/Geospatial_data). It is in `.csv` format. This will be read into a Pandas dataframe.

In [14]:
location_data = pd.read_csv('Geospatial_Coordinates.csv')
location_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
location_data.shape

(103, 3)

The `location_data` dataframe has exactly the same dimensions as the cleaned neighborhood dataset- `canada_df`.

The two dataframes can be merged through __INNER MERGE__ to form a compact dataframe containing the location data (latitude and longitude) along with borough and neighborhood name. The merge has to done __`ON`__ the `'Postal Code'` column.

In [17]:
neighborhoods_loc = pd.merge(canada_df, location_data, how='inner', on=['Postal Code'])
neighborhoods_loc.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


The `neighborhoods_loc` dataframe contains clean data, ready for analysis.