# Segmenting and Clustering Neighborhoods in Toronto - Part Two.
 
#### This Jupyter Notebook is constructed to gather, analyze and display data.   There is a _three_ part submission requirement.
**PART ONE**. 
    Initial Notebook submitted to my **Github repository** indicating the PostalCode, Borough, and Neighborhood omitting "Not assigned" designations. Data scrapped for wiki page table.
    
**PART TWO**. 
    In addition to the initial notebook data: PostalCode, Borough, and Neighborhood, are **Latitude and Longitiude** data.
    
##### PIPELINE:
        - Download code libraries and extract webpage data set
        - Explore the web page elements: PostalCode, Borough and Neighborhood
        - Analyze and eliminate some of the values of each Neighborhood group
        - Sumbit to Git Hub Repository
 

### Part One.

#### Bring in the libraries to assist with the data tasks

In [1]:
#import the library to open URLs
import urllib.request

In [2]:
#import the BeautifulSoup library to parse the HTML and XML
from bs4 import BeautifulSoup

In [3]:
#import Dataframe
import pandas as pd

#### Web page

In [4]:
#URL to List of Postal Code of Canada:M Wikipedia web page scarpe for dataframe data
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"


In [5]:
#open the url usung urllib.request and put the HTML into the page variable
page = urllib.request.urlopen(url)

In [6]:
#Soup variable
soup=BeautifulSoup(page, "lxml")

In [7]:
#to gather all the tables on the web
all_tables=soup.find_all("table")


In [8]:
#toronto_table is the targeted table
toronto_table=soup.find('table', class_ = 'wikitable sortable')


In [9]:
#looping through the tablerows and tabledata into lists
PostalCode=[]
Borough=[]
Neighborhood=[]
  
for row in toronto_table.findAll('tr'):
    cells=row.findAll('td')
        
    if len(cells) == 3:
        PostalCode.append(cells[0].find(text=True))
        Borough.append(cells[1].find(text=True))
        Neighborhood.append(cells[2].find(text=True))
      


***Sanity Check.*** _From the instructions notes: If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. Examined the Borough and Neighborhood list. The data condition does not exist. This is a *NOP ...No Operation* step._

In [10]:
#Sanity Check NOP [no operation on this item from #3 and 4th bullet]
# Neighborhood list with Not assigned

Neighborhood

['Not assigned\n',
 'Not assigned\n',
 'Parkwoods\n',
 'Victoria Village\n',
 'Regent Park, Harbourfront\n',
 'Lawrence Manor, Lawrence Heights\n',
 "Queen's Park, Ontario Provincial Government\n",
 'Not assigned\n',
 'Islington Avenue, Humber Valley Village\n',
 'Malvern, Rouge\n',
 'Not assigned\n',
 'Don Mills\n',
 'Parkview Hill, Woodbine Gardens\n',
 'Garden District, Ryerson\n',
 'Glencairn\n',
 'Not assigned\n',
 'Not assigned\n',
 'West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale\n',
 'Rouge Hill, Port Union, Highland Creek\n',
 'Not assigned\n',
 'Don Mills\n',
 'Woodbine Heights\n',
 'St. James Town\n',
 'Humewood-Cedarvale\n',
 'Not assigned\n',
 'Not assigned\n',
 'Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood\n',
 'Guildwood, Morningside, West Hill\n',
 'Not assigned\n',
 'Not assigned\n',
 'The Beaches\n',
 'Berczy Park\n',
 'Caledonia-Fairbanks\n',
 'Not assigned\n',
 'Not assigned\n',
 'Not assigned\n',
 'Woburn\n',
 'Not assigned\n

#### Dataframe

In [11]:
#load dataframe with data
df=pd.DataFrame(PostalCode,columns=['Postal Code'])
df['Borough']=Borough
df['Neighborhood']=Neighborhood
df=df.replace('\n','',regex=True)
df1=df[df.Borough != 'Not assigned']             


#### Report

In [12]:
#Data
print('Postal Codes of Canada')
shape = df1.shape
print('Number of rows :', shape[0])
print('Number of columns :', shape[1])

Postal Codes of Canada
Number of rows : 103
Number of columns : 3


#### End of Part One.

### PartTwo.

In [13]:
#Geographical coordinates of the Toronto Neighborhoods aligned with PostalCodes
!wget -O geospatial.csv http://cocl.us/Geospatial_data

--2020-06-17 00:10:17--  http://cocl.us/Geospatial_data
Resolving cocl.us (cocl.us)... 158.85.108.86, 158.85.108.83, 169.48.113.194
Connecting to cocl.us (cocl.us)|158.85.108.86|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cocl.us/Geospatial_data [following]
--2020-06-17 00:10:18--  https://cocl.us/Geospatial_data
Connecting to cocl.us (cocl.us)|158.85.108.86|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-06-17 00:10:19--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.26.197
Connecting to ibm.box.com (ibm.box.com)|107.152.26.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2020-06-17 00:10:20--  https://ibm.box.com/public

In [14]:
df2 = pd.read_csv('geospatial.csv')
df2.head(11)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [15]:
df_table = pd.merge(df1, df2)
df_table.head(11)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


### End of Part Two.
