# Neighborhoods of Canada

## Creating a Pandas dataframe

In [1]:
# importing pandas
import pandas as pd

In [2]:
# reading from wikipedia page through pandas
canada_df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
canada_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [3]:
canada_df.shape

(180, 3)

The dataframe has $180$ rows and $3$ columns.

But as it turns out, this data needs to be cleaned. There are many _Not assigned_ entries. They have to gotten rid of. Only those rows will be completely dropped which have __both__ 'Borough' and 'Neighborhood' columns as _Not assigned_.

In [4]:
# cleaning data

# getting rid of rows which have 'Not assigned' value in both 'Borough' and 'Neighborhood' columns
canada_df_filtered = canada_df[(canada_df.Borough != 'Not assigned') & (canada_df.Neighborhood != 'Not assigned')]

# resetting index as rows were dropped
canada_df_filtered = canada_df_filtered.reset_index()

# truncating the old index column
canada_df_filtered = canada_df_filtered.drop(columns=['index'])

In [5]:
canada_df_filtered.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Now, _Not assigned_ values will be searched where the 'Borough' value is assigned, __but__ 'Neighborhood' value is missing.

In [6]:
missing_neighborhood_values = canada_df_filtered[canada_df_filtered.Neighborhood == 'Not assigned']
missing_neighborhood_values.head()

Unnamed: 0,Postal Code,Borough,Neighborhood


The dataframe is empty, as there are no such row where the 'Neighborhood' column has a missing value, but 'Borough' column has an assigned value.

As the work will be done with the `canada_df_filtered` dataframe, we assign a name that signifies it.

In [7]:
canada_df = canada_df_filtered

### Measuring the dimensions of the `canada_df` dataframe

In [8]:
canada_df.shape

(103, 3)