WARNING
=========
<hr />

## Guys, excuse me for my poor English. it's a foreign language for me, and I will try hard to write in it as good as it is possible.

<hr />

Step 1: Notebook created!!
--------------------------------------

<hr />

Step 2: Scrape the [wiki page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) and create a dataframe
-----------------------------------------------------------------------------------------------------------------------------------

#### Import some useful staff:

In [1]:
import pandas as pd

import requests
from bs4 import BeautifulSoup # Great web scraping library!

#### Download and parse web-page:

In [2]:
page_link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page_response = requests.get(page_link, timeout=5) # Use GET to obtain the response
page_content = BeautifulSoup(page_response.content, 'html.parser') # Use html parser to parse the page

#### Let's extract the table from the content and build a pandas DataFrame from it:

To make it easy for you I put all the overwhelming code to module `wiki_table_extractor`. You can check it out in the [project's github repo](https://github.com/stolpa4/Coursera_Capstone/tree/master/Week3Assignment).

For this task we will use a specially prepared class - `WikiTableExtractor`. Let's import it.

In [3]:
from wiki_table_extractor import WikiTableExtractor

This class' method `parse_table_from_page()` takes one argument - our `BeautifulSoup` instance and parses it to `Table` object (described below). Now we will proceed to instantiate the class and use the described method.

In [4]:
table_extractor = WikiTableExtractor()
table_extractor.parse_table_from_page(page_content)
extracted_table = table_extractor.table

Our `Table` object - `extracted_table` consists of two lists, one with table titles and another with table rows. We can use a method of this object - `as_dict_list()` to obtain a list of dictionaries with table rows.
Let's obtain the list and use it to create a pandas DataFrame object.

In [5]:
table_dict_list = extracted_table.as_dict_list()
canada_post_codes = pd.DataFrame(data=table_dict_list)
print('DataFrame shape: ', canada_post_codes.shape)
canada_post_codes.head(10)

DataFrame shape:  (289, 3)


Unnamed: 0,Borough,Neighbourhood,Postcode
0,Not assigned,Not assigned,M1A
1,Not assigned,Not assigned,M2A
2,North York,Parkwoods,M3A
3,North York,Victoria Village,M4A
4,Downtown Toronto,Harbourfront,M5A
5,Downtown Toronto,Regent Park,M5A
6,North York,Lawrence Heights,M6A
7,North York,Lawrence Manor,M6A
8,Queen's Park,Not assigned,M7A
9,Not assigned,Not assigned,M8A


#### Now it's time to refine our DataFrame!

At first, let's get rid of all these 'Not assigned' postal codes!  
Again, here the code is encapsulated in the module named `utils`. If you are interested in implementation details, please, check out [this module](https://github.com/stolpa4/Coursera_Capstone/blob/master/Week3Assignment/utils.py).

In [6]:
import utils

In [7]:
canada_post_codes = utils.filter_not_assigned_postcodes(canada_post_codes)
print('DataFrame shape: ', canada_post_codes.shape)
canada_post_codes.head(10)

DataFrame shape:  (212, 3)


Unnamed: 0,Borough,Neighbourhood,Postcode
0,North York,Parkwoods,M3A
1,North York,Victoria Village,M4A
2,Downtown Toronto,Harbourfront,M5A
3,Downtown Toronto,Regent Park,M5A
4,North York,Lawrence Heights,M6A
5,North York,Lawrence Manor,M6A
6,Queen's Park,Not assigned,M7A
7,Etobicoke,Islington Avenue,M9A
8,Scarborough,Rouge,M1B
9,Scarborough,Malvern,M1B


The next step is to name all neighborhoods that are 'Not assigned' (in fact, there is only one) after the corresponding Boroughs.  
To notice the result, pay attention to the row 6.

In [8]:
canada_post_codes = utils.name_not_assigned_neighborhoods(canada_post_codes)
canada_post_codes.head(10)

Unnamed: 0,Borough,Neighbourhood,Postcode
0,North York,Parkwoods,M3A
1,North York,Victoria Village,M4A
2,Downtown Toronto,Harbourfront,M5A
3,Downtown Toronto,Regent Park,M5A
4,North York,Lawrence Heights,M6A
5,North York,Lawrence Manor,M6A
6,Queen's Park,Queen's Park,M7A
7,Etobicoke,Islington Avenue,M9A
8,Scarborough,Rouge,M1B
9,Scarborough,Malvern,M1B


Finally we need to combine the rows with the same post-code into one row with the neighborhoods separated with a comma.

In [9]:
canada_post_codes = utils.combine_rows_with_same_postcode(canada_post_codes)
print('DataFrame shape: ', canada_post_codes.shape)
canada_post_codes.tail(10)

DataFrame shape:  (103, 3)


Unnamed: 0,Postcode,Borough,Neighbourhood
93,M9A,Etobicoke,Islington Avenue
94,M9B,Etobicoke,"Cloverdale, Islington, Martin Grove, Princess ..."
95,M9C,Etobicoke,"Bloordale Gardens, Eringate, Markland Wood, Ol..."
96,M9L,North York,Humber Summit
97,M9M,North York,"Emery, Humberlea"
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."
102,M9W,Etobicoke,Northwest


Now our data is structured properly and we can proceed to further steps.
<hr />

Step 4: Get the latitude and the longitude coordinates of each neighborhood 
--------------------------------------------------------------------------------------------------------------

#### let's use geocoder implicitly (code is in the util library, but in fact it's just a copy of the code from the assignment description)

In [10]:
latitude_column = []
longitude_column = []

for index, row in canada_post_codes.iterrows():
    latitude, longitude = utils.get_latitude_longitude(row['Postcode'])
    latitude_column.append(latitude)
    longitude_column.append(longitude)

Now, after we obtained lists with latitudes and longitudes of the neighborhoods, let's include them to our dataframe.

In [12]:
canada_post_codes['Latitude'] = latitude_column
canada_post_codes['Longitude'] = longitude_column
print('DataFrame shape: ', canada_post_codes.shape)
canada_post_codes.head(10)

DataFrame shape:  (103, 5)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",0,0
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",0,0
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",0,0
3,M1G,Scarborough,Woburn,0,0
4,M1H,Scarborough,Cedarbrae,0,0
5,M1J,Scarborough,Scarborough Village,0,0
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",0,0
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",0,0
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",0,0
9,M1N,Scarborough,"Birch Cliff, Cliffside West",0,0


Oh no! There must be some problem with geocoder library. Let's try again using CSV file with the coordinates