# Toronto Neighbourhoods - geocodes
#### This is part of the Course [<u>*Applied Data Science Capstone*</u>](https://www.coursera.org/learn/applied-data-science-capstone/) on Coursera, to complete the Specialization <u>*IBM Data Science Professional Certificate*</u>

This exercise is to get the geocodes from the Toronto Neighbourhoods we got in the first [notebook](https://github.com/rareal/Coursera_Capstone/blob/master/Toronto_Neighborhoods.ipynb). We're getting the latitude and longitude from the postcodes in the dataframe. 
The instructions suggest to use the `geocoder` package, but that is not working. After looking around I found https://my.locationiq.com, got a developer token, 10000 free calls a day. **LocationIQ** [api docs](https://locationiq.com/docs#forward-geocoding).


[Other googlemaps alternatives](http://geoawesomeness.com/google-maps-api-alternatives-best-cheap-affordable/).

---------------
Importing dependencies:

In [27]:
import requests
import pandas as pd
import numpy as np
import time

In [28]:
apikey = '3519d86646e89c'

LocationIQ search api call example:

In [29]:
# Search / Forward Geocoding url
search_url = "https://us1.locationiq.com/v1/search.php"
data = {
    'key': apikey,
    'q': 'Empire State Building',
    'format': 'json'
}
response = requests.get(search_url, params=data)

In [30]:
print(response.json()[0]['display_name'])
print('latitude: ',response.json()[0]['lat'])
print('longitude: ',response.json()[0]['lon'])

Empire State Building, 350, 5th Avenue, Korea Town, Midtown South, Manhattan, Manhattan Community Board 5, New York County, New York City, New York, 10001, USA
latitude:  40.7484284
longitude:  -73.9856546198733


The API can search for postalcode directly, which is more reliable. 

Example: 

In [31]:
data = {'key': apikey,'postalcode':'M5K','countrycode':'CA','format': 'json'}
response = requests.get(search_url, params=data)
response.json()

[{'boundingbox': ['43.6469', '43.6469', '-79.3823', '-79.3823'],
  'class': 'place',
  'display_name': 'Downtown Toronto (Toronto Dominion Centre / Design Exchange), Toronto, Ontario, M5K, Canada',
  'importance': 0.1,
  'lat': '43.6469',
  'licence': '© LocationIQ.com CC BY 4.0, Data © OpenStreetMap contributors, ODbL 1.0',
  'lon': '-79.3823',
  'place_id': '75636',
  'type': 'postcode'}]

-----
#### Toronto Neighbourhoods - postcodes and geocodes
I exported the `Toronto_Neighbourhoods.csv` in the first notebook [Toronto_Neighborhoods.ipynb](https://github.com/rareal/Coursera_Capstone/blob/master/Toronto_Neighborhoods.ipynb)   
Importing into a pandas DataFrame:

In [32]:
Toronto_neigh = pd.read_csv('Toronto_Neighbourhoods.csv',index_col=[0])

Now we need to get the geocode for each postcode in the dataframe. First, let's initiate two arrays to store the data, `lat` and `lon`, filled with `'None'`.

In [33]:
nrow = len(Toronto_neigh.Postcode)
lat = pd.Series(['None']*nrow)
lon = pd.Series(['None']*nrow)

Now we loop the postcodes, get the geocode and store in the `lat` and `lon` variables.    
The API limit is 1 request per second, so it's better to include a `sleep` in the loop.  
I'm also using an error handler in case the postcode is not found.

In [38]:
for i in range(10):
    print('car',i,end='\r')
    print('plane',i,end='\r')
    time.sleep(1)

plane 9

In [40]:
for i in range(nrow):
    #print('i: ',i)
    PC = Toronto_neigh.Postcode[i]
    data = {'key': apikey,'postalcode':'{}'.format(PC),'countrycode':'CA','format': 'json'}
    try:
        response = requests.get(search_url, params=data)
        response_json = response.json()
        lat[i] = response_json[0]['lat']
        lon[i] = response_json[0]['lon']
    except Exception:
        continue
    print('i: {}, PC: {}, Lat: {}, Lon: {}'.format(i,PC,lat[i],lon[i]),end='\r')
    time.sleep(1)    

i: 102, PC: M8Z, Lat: 43.6256, Lon: -79.5231

In [41]:
pd.Series(lat!='None').value_counts()

True     102
False      1
dtype: int64

One postcode could not be found.

In [42]:
Toronto_neigh[lat=='None']

Unnamed: 0,Postcode,Borough,Neighbourhood
76,M7R,Mississauga,Canada Post Gateway Processing Centre


In [43]:
data = {'key': apikey,'postalcode':'{}'.format(Toronto_neigh.Postcode[76]),'countrycode':'CA','format': 'json'}
response = requests.get(search_url, params=data)
response.json()

{'error': 'Unable to geocode'}

Trying the search query method, using the Borough and Neighbourhood names:

In [44]:
data = {'key': apikey,'q':', '.join(Toronto_neigh.iloc[76,1:3].values),'format': 'json'}
response = requests.get(search_url, params=data)
res = response.json()

In [45]:
# extracting the lat and lon
print('matches:',len(res))
for item in res:
    try:
        print('lat: {}, lon: {}, postcode: {}'.format(item['lat'],item['lon'],item['postcode']))
    except Exception:
        print('lat: {}, lon: {}, no postcode'.format(item['lat'],item['lon']))

matches: 10
lat: 43.596832, lon: -79.623997, no postcode
lat: 43.570452, lon: -79.626636, no postcode
lat: 43.569545, lon: -79.59661, no postcode
lat: 43.716528, lon: -79.637611, no postcode
lat: 43.66198, lon: -79.665466, no postcode
lat: 43.644478, lon: -79.708221, no postcode
lat: 43.639839, lon: -79.713425, no postcode
lat: 43.625576, lon: -79.676659, no postcode
lat: 43.649548, lon: -79.666832, no postcode
lat: 43.654701, lon: -79.665771, no postcode


In [46]:
# average
print('lat: ',pd.Series([x['lat'] for x in res]).astype(float).mean())
print('lon: ',pd.Series([x['lon'] for x in res]).astype(float).mean())

lat:  43.63294789999999
lon:  -79.6581228


The geocodes were provided in the assignment from this link http://cocl.us/Geospatial_data, which has the file `Geospatial_Coordinates.csv`. Importing the file to check the coordinates expected for `M7R`

In [47]:
ref_codes = pd.read_csv('Geospatial_Coordinates.csv')
ref_codes.head(3)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711


In [48]:
ref_codes[ref_codes['Postal Code']=='M7R']

Unnamed: 0,Postal Code,Latitude,Longitude
86,M7R,43.636966,-79.615819


This is similar to the average I got, from the address search:    

geocodes|mine|reference 
---|:---|---
lat|43.632948|43.636966	
lon|-79.658123|-79.615819

But I could not find a postcode to confirm. In LocationIQ there is a reverse search, to get address from geocodes. I'll run that with the codes from the reference file.

In [49]:
# Reverse Geocoding method url
reverse_url = "https://us1.locationiq.com/v1/reverse.php"
# M7R
latt=43.6369656
long=-79.615819

data = {'key': apikey,'lat': latt,'lon': long,'format': 'json'}
response = requests.get(reverse_url, params=data)

In [50]:
response.json()['address']['postcode']

'L4W 5G6'

Maybe this is because the Code in question is for the `Canada Post Gateway Processing Centre`, so it can be a special code only for that place.   
In any case, I'm updating my tabe with the reference geocodes for that postcode.

In [51]:
lat[76]=latt
lon[76]=long
print(lat[76],lon[76])

43.6369656 -79.615819


Joining in a DataFrame:

In [79]:
Toronto_neigh['Latitude']= lat.astype(float)
Toronto_neigh['Longitude']= lon.astype(float)
Toronto_neigh.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.6555,-79.3626
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.7223,-79.4504
4,M7A,Queen's Park,Queen's Park,43.6641,-79.3889


Now I will compare the geocodes I got with the ones provided, to see if they match.

0      M1B
1      M1C
2      M1E
3      M1G
4      M1H
5      M1J
6      M1K
7      M1L
8      M1M
9      M1N
10     M1P
11     M1R
12     M1S
13     M1T
14     M1V
15     M1W
16     M1X
17     M2H
18     M2J
19     M2K
20     M2L
21     M2M
22     M2N
23     M2P
24     M2R
25     M3A
26     M3B
27     M3C
28     M3H
29     M3J
      ... 
73     M6C
74     M6E
75     M6G
76     M6H
77     M6J
78     M6K
79     M6L
80     M6M
81     M6N
82     M6P
83     M6R
84     M6S
85     M7A
86     M7R
87     M7Y
88     M8V
89     M8W
90     M8X
91     M8Y
92     M8Z
93     M9A
94     M9B
95     M9C
96     M9L
97     M9M
98     M9N
99     M9P
100    M9R
101    M9V
102    M9W
Name: Postal Code, dtype: object

In [97]:
# latitude diff
latdiff = abs(ref_codes.Latitude.values - Toronto_neigh.Latitude.values)
pd.Series(latdiff)[latdiff>0.1].count()/len(latdiff)

0.1941747572815534

In [96]:
# Longitude diff
londiff = abs(ref_codes.Longitude.values - Toronto_neigh.Longitude.values)
pd.Series(londiff)[londiff>0.1].count()/len(londiff)

0.47572815533980584

In [99]:
# max difference
max(latdiff),max(londiff)

(0.19048480000000012, 0.3703007999999812)

In [102]:
mine=Toronto_neigh.copy()

In [110]:
mine.set_index('Postcode')[ref_codes['Postal Code']]

KeyError: "['M1B' 'M1C' 'M1E' 'M1G' 'M1H' 'M1J' 'M1K' 'M1L' 'M1M' 'M1N' 'M1P' 'M1R'\n 'M1S' 'M1T' 'M1V' 'M1W' 'M1X' 'M2H' 'M2J' 'M2K' 'M2L' 'M2M' 'M2N' 'M2P'\n 'M2R' 'M3A' 'M3B' 'M3C' 'M3H' 'M3J' 'M3K' 'M3L' 'M3M' 'M3N' 'M4A' 'M4B'\n 'M4C' 'M4E' 'M4G' 'M4H' 'M4J' 'M4K' 'M4L' 'M4M' 'M4N' 'M4P' 'M4R' 'M4S'\n 'M4T' 'M4V' 'M4W' 'M4X' 'M4Y' 'M5A' 'M5B' 'M5C' 'M5E' 'M5G' 'M5H' 'M5J'\n 'M5K' 'M5L' 'M5M' 'M5N' 'M5P' 'M5R' 'M5S' 'M5T' 'M5V' 'M5W' 'M5X' 'M6A'\n 'M6B' 'M6C' 'M6E' 'M6G' 'M6H' 'M6J' 'M6K' 'M6L' 'M6M' 'M6N' 'M6P' 'M6R'\n 'M6S' 'M7A' 'M7R' 'M7Y' 'M8V' 'M8W' 'M8X' 'M8Y' 'M8Z' 'M9A' 'M9B' 'M9C'\n 'M9L' 'M9M' 'M9N' 'M9P' 'M9R' 'M9V' 'M9W'] not in index"

In [119]:
'M6P' in mine.Postcode

False