## Install Deps & Setup Import

In [52]:
%pip3 install lxml
import pandas as pd



## Read Table
Pandas has a read_html() method that takes tables from an HTML page and converts them to dataframes. We find that the table we want is the first to be located by Pandas so we just assign it to the html_df variable

In [53]:
html_df = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")[0]
html_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


## Remove Boroughs
Remove all Boroughs that have "Not assigned" as their row value

In [54]:
df = html_df[~html_df["Borough"].str.contains("Not assigned")]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


## Reassign Neighbourhoods
Some Neighbourhoods have "Not assigned" as their value. We find those and assign the value of the Borough to replace the "Not assigned" values in Neighbourhoods.

In [55]:
df.loc[df['Neighbourhood']=='Not assigned', 'Neighbourhood'] = df['Borough']
df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


## Grouping
We group by Postcode while aggregating all the Neighbourhood values together; we join these values with a comma for clarity. Reseting the index for it to display as intended.

In [56]:
df = df.groupby('Postcode').agg({'Borough':'first', 
                                 'Neighbourhood': ', '.join
                                }).reset_index()
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


## Size
Output our shape of our dataframe as 103 row of 3 columns

In [57]:
df.shape

(103, 3)