# Toronto Neighbourhoods - geocodes
#### This is part of the Course <u>*Applied Data Science Capstone*</u> on Coursera, to complete the Specialization <u>*IBM Data Science Professional Certificate*</u>

This exercise is to get the geocodes from the Toronto Neighbourhoods we got in the first [notebook](https://github.com/rareal/Coursera_Capstone/blob/master/Toronto_Neighborhoods.ipynb). We're getting the latitude and longitude from the postcodes in the dataframe. 
The instructions suggest to use the `geocoder` package, but that is not working. After looking around I found https://my.locationiq.com, got a developer token, 10000 free calls a day. **LocationIQ** [api docs](https://locationiq.com/docs#forward-geocoding).


[Other googlemaps alternatives](http://geoawesomeness.com/google-maps-api-alternatives-best-cheap-affordable/).

---------------
Importing dependencies:

In [1]:
import requests
import pandas as pd
import numpy as np
import time

In [2]:
apikey = '3519d86646e89c'

LocationIQ search api call example:

In [56]:
# Search / Forward Geocoding url
search_url = "https://us1.locationiq.com/v1/search.php"
data = {
    'key': apikey,
    'q': 'Empire State Building',
    'format': 'json'
}
response = requests.get(url, params=data)

In [4]:
print(response.json()[0]['display_name'])
print('latitude: ',response.json()[0]['lat'])
print('longitude: ',response.json()[0]['lon'])

Empire State Building, 350, 5th Avenue, Korea Town, Midtown South, Manhattan, Manhattan Community Board 5, New York County, New York City, New York, 10001, USA
latitude:  40.7484284
longitude:  -73.9856546198733


The API can search for postalcode directly, which produces more robust results. 

Example: 

In [57]:
data = {'key': apikey,'postalcode':'M5K','countrycode':'CA','format': 'json'}
response = requests.get(search_url, params=data)
response.json()

[{'place_id': '75636',
  'licence': '© LocationIQ.com CC BY 4.0, Data © OpenStreetMap contributors, ODbL 1.0',
  'boundingbox': ['43.6469', '43.6469', '-79.3823', '-79.3823'],
  'lat': '43.6469',
  'lon': '-79.3823',
  'display_name': 'Downtown Toronto (Toronto Dominion Centre / Design Exchange), Toronto, Ontario, M5K, Canada',
  'class': 'place',
  'type': 'postcode',
  'importance': 0.1}]

-----
#### Toronto Neighbourhoods - postcodes and geocodes
I got the `Toronto_Neighbourhoods.csv` from the first notebook [Toronto_Neighborhoods.ipynb](https://github.com/rareal/Coursera_Capstone/blob/master/Toronto_Neighborhoods.ipynb)   
Importing into a pandas DataFrame:

In [29]:
Toronto_neigh = pd.read_csv('Toronto_Neighbourhoods.csv',index_col=[0])

Now we need to get the geocode for each postcode in the dataframe. First, let's initiate two arrays to store the data, `lat` and `lon`, filled with `'None'`.

In [30]:
nrow = len(Toronto_neigh.Postcode)
lat = pd.Series(['None']*nrow)
lon = pd.Series(['None']*nrow)

Now we loop the postcodes, get the geocode and store in the `lat` and `lon` variables.  
The LocationIQ api has a postcode search method, with coutrycode.  
The API limit is 1 request per second, so it's better to include a `sleep` in the loop.

In [None]:
for i in range(nrow):
    print('i: ',i)
    PC = Toronto_neigh.Postcode[i]
    data = {'key': apikey,'postalcode':'{}'.format(PC),'countrycode':'CA','format': 'json'}
    try:
        response = requests.get(search_url, params=data)
        response_json = response.json()
        lat[i] = response_json[0]['lat']
        lon[i] = response_json[0]['lon']
        print('PC: {}, Lat: {}, Lon: {}'.format(PC,lat[i],lon[i]))
    except Exception:
        continue
    time.sleep(1)

One postcode could not be found.

In [106]:
pd.Series(lat!='None').value_counts()

True     102
False      1
dtype: int64

In [54]:
Toronto_neigh[lat=='None']

Unnamed: 0,Postcode,Borough,Neighbourhood
76,M7R,Mississauga,Canada Post Gateway Processing Centre


In [58]:
data = {'key': apikey,'postalcode':'{}'.format(Toronto_neigh.Postcode[76]),'countrycode':'CA','format': 'json'}
response = requests.get(search_url, params=data)
response.json()

{'error': 'Unable to geocode'}

Trying the search query method, using the Borough and Neighbourhood names:

In [61]:
data = {'key': apikey,'q':', '.join(Toronto_neigh.iloc[76,1:3].values),'format': 'json'}
response = requests.get(search_url, params=data)
res = response.json()

In [82]:
print('matches:',len(res))
for item in res:
    try:
        print('lat: {}, lon: {}, postcode: {}'.format(item['lat'],item['lon'],item['postcode']))
    except Exception:
        print('lat: {}, lon: {}, no postcode'.format(item['lat'],item['lon']))

matches: 10
lat: 43.596832, lon: -79.623997, no postcode
lat: 43.570452, lon: -79.626636, no postcode
lat: 43.569545, lon: -79.59661, no postcode
lat: 43.716528, lon: -79.637611, no postcode
lat: 43.66198, lon: -79.665466, no postcode
lat: 43.644478, lon: -79.708221, no postcode
lat: 43.639839, lon: -79.713425, no postcode
lat: 43.625576, lon: -79.676659, no postcode
lat: 43.649548, lon: -79.666832, no postcode
lat: 43.654701, lon: -79.665771, no postcode


In [94]:
print('lat: ',pd.Series([x['lat'] for x in res]).astype(float).mean())
print('lon: ',pd.Series([x['lon'] for x in res]).astype(float).mean())

lat:  43.63294789999999
lon:  -79.6581228


In [59]:
# Reverse Geocoding method url
reverse_url = "https://us1.locationiq.com/v1/reverse.php"
# M7R
latt=43.6369656
long=-79.615819

data = {'key': apikey,'lat': latt,'lon': long,'format': 'json'}
response = requests.get(reverse_url, params=data)
response.json()

{'place_id': '85793346',
 'licence': '© LocationIQ.com CC BY 4.0, Data © OpenStreetMap contributors, ODbL 1.0',
 'osm_type': 'way',
 'osm_id': '34551703',
 'lat': '43.63645615',
 'lon': '-79.6149124359677',
 'display_name': 'Canada Post Gateway sorting station, South Gateway Road, Rathwood, Mississauga, Peel Region, Ontario, L4W 5G6, Canada',
 'address': {'building': 'Canada Post Gateway sorting station',
  'road': 'South Gateway Road',
  'neighbourhood': 'Rathwood',
  'city': 'Mississauga',
  'county': 'Peel Region',
  'state': 'Ontario',
  'postcode': 'L4W 5G6',
  'country': 'Canada',
  'country_code': 'ca'},
 'boundingbox': ['43.6348319', '43.637991', '-79.6186161', '-79.6119281']}