## Getting Geo-coordinates for WSJ Colleges
Here we are going to use a couple of Python tools to make a database of the Latitude / Longitude locations for the different schools contained in the report. I'm doing this to compare the speed and accuracy of the included Power BI ArcGIS maps with a hard-coding the coordinates. 

Our strategy is:
- Create a search string using the college name and city.
- Use `requests` to query Google Maps API.
- Save the database as a new file.

First, we read in the WSJ data and create a search string.

In [10]:
import pandas as pd

wsj = pd.read_csv('wsj_data.csv')

import os

if os.path.exists('wsj_locs.csv'):
    geodf = pd.read_csv('wsj_locs.csv', index_col='loc_string')
else:
    geodf = pd.DataFrame()
    geodf.index.name = 'loc_string'
    
wsj.head()


Unnamed: 0,rank,college,city_state,overall,outcome,resources,engagement,environment,right_choice,salary,default_rate,class,loc_string
0,1,Harvard University,"Cambridge, MA",91.9,39.5,29.8,15.6,7.0,9.09,91000,0.9,Private,"Harvard University, Cambridge, MA, USA"
1,2,Columbia University,"New York, NY",90.6,39.0,27.0,16.1,7.8,8.06,74000,1.4,Private,"Columbia University, New York, NY, USA"
2,3,Massachusetts Institute of Technology,"Cambridge, MA",90.4,38.2,29.2,15.8,7.2,9.11,90000,1.1,Private,"Massachusetts Institute of Technology, Cambrid..."
3,3,Stanford University,"Stanford, CA",90.4,38.9,26.2,17.4,7.9,8.96,83000,0.8,Private,"Stanford University, Stanford, CA, USA"
4,5,Duke University,"Durham, NC",90.2,39.5,26.7,17.2,6.8,9.19,77000,0.4,Private,"Duke University, Durham, NC, USA"


For each college, we're going to create a search string as if we were looking it up in Google Maps. It's important to include as much information as we have so that the location service doesn't get confused with institutions in other countries, for example.

In [18]:
overwrite_loc_string = None
if overwrite_loc_string:
    wsj['loc_string'] = wsj.apply(lambda s: '{}, {}, USA'.format(s.college, s.city_state), axis=1)
    wsj.to_csv('wsj_data.csv', encoding='utf-8', index=None)

print(wsj.loc_string[0:5])

0               Harvard University, Cambridge, MA, USA
1               Columbia University, New York, NY, USA
2    Massachusetts Institute of Technology, Cambrid...
3               Stanford University, Stanford, CA, USA
4                     Duke University, Durham, NC, USA
Name: loc_string, dtype: object


In [3]:
def getCoords(search_string):
    '''Takes a search term, queries Google and returns the geocoordinates.'''
    import requests
    
    try:
        query = search_string.replace(' ', '+')
        response = requests.get('https://maps.googleapis.com/maps/api/geocode/json?address={}'.format(query))
        response_from_google = response.json()
        
        address = response_from_google['results'][0]['formatted_address']
        latitude = response_from_google['results'][0]['geometry']['location']['lat']
        longitude = response_from_google['results'][0]['geometry']['location']['lng']
        
        return pd.Series(name=search_string, \
                         data={'Address': address, 'Latitude': latitude, 'Longitude': longitude})
    except:
        return pd.Series(name=search_string, data={'Address': None, 'Latitude': None, 'Longitude': None})

In [21]:
for ind, school in wsj.loc_string.iteritems():
    if (not school in geodf.index) or (geodf.loc[school, 'Address'] == None):
        data = getCoords(school)
        geodf.loc[school] = data
        print(school, '\n\t\t ', data)

Gonzaga University, Spokane, WA, USA 
		  Address      502 E Boone Ave, Spokane, WA 99202, USA
Latitude                                     47.6672
Longitude                                   -117.402
Name: Gonzaga University, Spokane, WA, USA, dtype: object
Campbell University, Buies Creek, NC, USA 
		  Address      143 Main St, Buies Creek, NC 27506, USA
Latitude                                     35.4083
Longitude                                   -78.7394
Name: Campbell University, Buies Creek, NC, USA, dtype: object
SUNY New Paltz, New Paltz NY, USA 
		  Address      1 Hawk Dr, New Paltz, NY 12561, USA
Latitude                                  41.739
Longitude                               -74.0852
Name: SUNY New Paltz, New Paltz NY, USA, dtype: object


In [22]:
geodf.to_csv('wsj_locs.csv', encoding='utf-8')
