<h1> Part 1: Canadia Postal Code </h3>

<h3> Imports the necessary libraries </h3>

In [5]:
conda install -c anaconda lxml

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.1.1   |                0         132 KB  anaconda
    certifi-2020.4.5.1         |           py36_0         159 KB  anaconda
    libxslt-1.1.33             |       h7d1a2b0_0         577 KB  anaconda
    lxml-4.5.1                 |   py36hefd8a0e_0         1.4 MB  anaconda
    openssl-1.1.1g             |       h7b6447c_0         3.8 MB  anaconda
    ------------------------------------------------------------
                                           Total:         6.0 MB

The following NEW packages will be INSTALLED:

  libxslt            anaconda/linux-64::libxslt-1.1.33-h7d1a2b0_0
  lxml               anaconda/linu

In [6]:
import requests
import lxml.html as lh
import pandas as pd

<h3> Gets the html and finds specific tags </h3>

In [7]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
#Create a handle, page, to handle the contents of the website
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')

<h3> Stores the first row as header </h3>

In [8]:
#Check the length of the first 12 rows
[len(T) for T in tr_elements[:12]]

tr_elements = doc.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+=1
    name=t.text_content()
    print('%d:"%s"'%(i,name))
    col.append((name,[]))


1:"Postal Code
"
2:"Borough
"
3:"Neighborhood
"


<h3> Goes through the rows and stores the values </h3>

In [9]:
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 10, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1

In [10]:
[len(C) for (title,C) in col]   #Displays the length of each column

[181, 181, 181]

<h3> Stores the column names appropriately into the data frame </h3>

In [11]:
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)
df.columns = ['Postal Code', 'Borough', 'Neighborhood'] 
print(df.columns.tolist())

['Postal Code', 'Borough', 'Neighborhood']


In [12]:
#Gets rid of the line break symbol
new_df = df.replace('\n', '', regex=True)

In [13]:
#Drops the last row as it is unnecessary for our table
new_df.drop(new_df.tail(1).index,inplace=True) # drops the last row

<h3> Drops Boroughs that are not assigned and Replaces Neighborhoods that are not assigned with Borough name </h3>

In [14]:
#Drops not assigned rows
canada_df = new_df[new_df['Borough'] != 'Not assigned']

#Replaces not assigned neighborhoods with borough name
canada_df["Neighborhood"]= canada_df["Neighborhood"].replace("Not assigned", canada_df["Borough"]) 
canada_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


<h3> Displays the shape of the dataframe </h3>

In [15]:
canada_df.shape

(103, 3)

<h1> Part 2: Merging Locations to the dataset </h1>

<h3> Reads the csv file from online and displays it </h3>

In [16]:
location_df = pd.read_csv('http://cocl.us/Geospatial_data')
location_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<h3> Merges two dataframes into one based on postal code </h3>

In [17]:
import pandas as pd
canada_location = canada_df.merge(location_df, on='Postal Code', how = 'inner')

In [24]:
canada_location.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


<h1> Part 3: Geomapping </h1>

<h3> Installs Folium package for geomapping </h3>

In [20]:
!conda install -c conda-forge folium=0.5.0 --yes 
import folium 
print("imported")

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    brotlipy-0.7.0             |py36h8c4c3a4_1000         346 KB  conda-forge
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    chardet-3.0.4              |py36h9f0ad1d_1006         188 KB  conda-forge
    cryptography-2.9.2         |   py36h45

<h3> Creates the map and tags the Boroughs </h3>

In [33]:
# create map

locations = canada_location[['Latitude', 'Longitude']]
locationlist = locations.values.tolist()
len(locationlist)


map = folium.Map(location=[43.733259, -79.329656], zoom_start=11)
for point in range(0, len(locationlist)):
    folium.Marker(locationlist[point], popup=canada_location['Borough'][point]).add_to(map)
map
