Question 1
Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe

Importing lib from python to get the data from the wiki website in the required format.

In [36]:
import requests
data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [40]:
#Make sure to install BeautifulSoup liabrary and lxml parser in your computer via command prompt 
from bs4 import BeautifulSoup
soup= BeautifulSoup(data, 'html.parser')
print(soup.prettify())



It is observed from the data that the tabular data is available in Table and belongs to class= 'wikitable sortable'. 
Let's extract the tabular data.

In [44]:
Table_data= soup.find('table',{'class':'wikitable sortable'})
print(Table_data)

In [45]:
print(Table_data.tr.text)


Postal Code

Borough

Neighborhood



As we see the main headers are Postal Code, Borough and Neighborhood.

In [228]:
headers= ['Postalcode','Borough','Neighborhood']
import pandas as pd
df= pd.DataFrame(columns= headers)
                

Let's convert our data into a more readable form by replacing 'tr' and 'td

In [229]:
for tr_cell in Table_data.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data
df.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Now let's clean our data.

Removing Not assigned from Borough

In [230]:
index = df[df['Borough']== 'Not assigned'].index
df.drop(index, inplace= True)


In [231]:
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


If a cell has a Borough but it has a NOt assigned value of Neighborhood than that Neighborhood will be same as it's Borough.

In [232]:
df.loc[df['Neighborhood']== 'Not assigned','Neighborhood']= df['Borough']
df.head(10)

Unnamed: 0,Postalcode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


Now we will combine the Neighborhoods of those who have the same Postal Code

In [233]:
final= df.groupby(['Postalcode','Neighborhood'], sort= False).agg(','.join)
df_new= final.reset_index()
df_new.head(10)

Unnamed: 0,Postalcode,Neighborhood,Borough
0,M3A,Parkwoods,North York
1,M4A,Victoria Village,North York
2,M5A,"Regent Park, Harbourfront",Downtown Toronto
3,M6A,"Lawrence Manor, Lawrence Heights",North York
4,M7A,"Queen's Park, Ontario Provincial Government",Downtown Toronto
5,M9A,"Islington Avenue, Humber Valley Village",Etobicoke
6,M1B,"Malvern, Rouge",Scarborough
7,M3B,Don Mills,North York
8,M4B,"Parkview Hill, Woodbine Gardens",East York
9,M5B,"Garden District, Ryerson",Downtown Toronto


Let's find out the shape of our DataSet.

In [234]:
df_new.shape

(103, 3)

Problem 2:
Use the Geocoder package or the csv file to create dataframe with longitude and latitude values.

We will use the given link to find out the latitudes and longitudes

In [235]:

!wget -q -O 'Toronto_long_lat_data.csv'  http://cocl.us/Geospatial_data
df_lon_lat = pd.read_csv('Toronto_long_lat_data.csv')
df_lon_lat.head()


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [236]:
df_lon_lat.columns=['Postalcode','Latitude','Longitude']
df_long_lat.head()

Unnamed: 0,Postalcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [238]:
Toronto_df = pd.merge(df_new,
                 df_lon_lat[['Postalcode','Latitude', 'Longitude']],
                 on='Postalcode')
Toronto_df.head(10)

Unnamed: 0,Postalcode,Neighborhood,Borough,Latitude,Longitude
0,M3A,Parkwoods,North York,43.753259,-79.329656
1,M4A,Victoria Village,North York,43.725882,-79.315572
2,M5A,"Regent Park, Harbourfront",Downtown Toronto,43.65426,-79.360636
3,M6A,"Lawrence Manor, Lawrence Heights",North York,43.718518,-79.464763
4,M7A,"Queen's Park, Ontario Provincial Government",Downtown Toronto,43.662301,-79.389494
5,M9A,"Islington Avenue, Humber Valley Village",Etobicoke,43.667856,-79.532242
6,M1B,"Malvern, Rouge",Scarborough,43.806686,-79.194353
7,M3B,Don Mills,North York,43.745906,-79.352188
8,M4B,"Parkview Hill, Woodbine Gardens",East York,43.706397,-79.309937
9,M5B,"Garden District, Ryerson",Downtown Toronto,43.657162,-79.378937
