# Determine GPS Coordinates for a List of Locations & Save in a Pandas DataFrame

__Goal:__ Query `Google Maps Geocoding API` for the GPS coordinates of multiple locations and create a `pandas` DataFrame of query results. Save the coordinates DataFrame to a comma-separated value (CSV) file.


## Access the Google Maps Platform

We utilize the `Google Maps Geocoding API` (a part of the Google Maps Platform) to determine coordinate values. To use this API, you need a Google Maps API key, which is available with a Google Account.

<img src="images/google_maps_platform.png">

You must set up a billing account to use the Google Maps Platform. Fortunately, you get $200 in free usage every month. That is enough to determine a large number of GPS coordinates for free.

1. Go to the [Google Cloud Platform Console](https://console.cloud.google.com/getting-started).
2. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
3. Select `Billing`.
4. Set up your billing account.

Now, let's create a `coordinates` project.

1. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
2. Select `Home`.
3. Click on the project drop-down in the top navigation bar.
4. Click `NEW PROJECT`.
5. Enter `coordinates` in the `Project name` field.
6. Click `Create`.

Now, let's enable the necessary APIs.

1. Click on the project drop-down in the top navigation bar.
2. Select the `coordinates` project.
3. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
4. Select `APIs & Services`.
5. Click `+ ENABLE APIS & SERVICES`.
6. Search for and select the `Geocoding API`.
7. Click `ENABLE`.

Finally, let's create an API key.

1. Click the `navigation menu` button (the three horizontal lines in the upper left-hand corner).
2. Select `APIs & Services > Credentials`.
3. Click `+ CREATE CREDENTIALS`.
4. Select `API key`.
5. Copy your API key. You use this in a second.


## Collect Coordinates Data

Now to a Jupyter notebook. To begin, we import necessary libraries.

In [1]:
# Import necessary libraries
import googlemaps
import pandas as pd

We define a constant for our Google Maps API key (be sure to replace `YOUR_API_KEY` with your actual API key).

In [2]:
# Enter Google Maps API key
GOOGLE_MAPS_API_KEY = "YOUR_API_KEY"

There are several ways to create a list of locations. In our situation, we have a venues text file containing the venue names.

In [3]:
# Print contents of venues text file
with open('hard_data/all_venues.txt', 'r') as f:
    
    print(f.read())

Amherst College
Bates College
Bowdoin College
Colby College
Connecticut College
Hamilton College
Middlebury College
Trinity College
Tufts University
Wesleyan University
Williams College



We use a list comprehension to read each line of the venue names file, strip extraneous white spaces, and create a venue names list composed of each stripped line.

In [4]:
# Load venue names from file
with open('hard_data/all_venues.txt', 'r') as f:
    
    # Create list of venue names
    venue_names = [line.strip() for line in f]
    
# Preview list of venue names
venue_names

['Amherst College',
 'Bates College',
 'Bowdoin College',
 'Colby College',
 'Connecticut College',
 'Hamilton College',
 'Middlebury College',
 'Trinity College',
 'Tufts University',
 'Wesleyan University',
 'Williams College']

Looks good. We are now ready to harness the power of `Google Maps Geocoding API`. 

We instantiate a `Google Maps API` session and initialize a data dictionary to hold venue names and GPS coordinates. We will eventually use this dictionary to create a `pandas` DataFrame.

In [5]:
# Instantiate Google Maps API session
gmaps = googlemaps.Client(GOOGLE_MAPS_API_KEY)

# Initialize data dictionary to hold values
coordinates_data = {
    'Venue Name': [],
    'Coordinates': [],
    'Latitude': [],
    'Longitude': []
}

Let's check out an example query using `Google Maps Geocoding API`.

In [10]:
# Set example venue
venue = venue_names[0]

# Query Google Maps API for GPS coordinates
results = gmaps.geocode(address=venue)

# Preview results
results

[{'address_components': [{'long_name': 'Amherst',
    'short_name': 'Amherst',
    'types': ['locality', 'political']},
   {'long_name': 'Amherst Center',
    'short_name': 'Amherst Center',
    'types': ['neighborhood', 'political']},
   {'long_name': 'Hampshire County',
    'short_name': 'Hampshire County',
    'types': ['administrative_area_level_2', 'political']},
   {'long_name': 'Massachusetts',
    'short_name': 'MA',
    'types': ['administrative_area_level_1', 'political']},
   {'long_name': 'United States',
    'short_name': 'US',
    'types': ['country', 'political']},
   {'long_name': '01002', 'short_name': '01002', 'types': ['postal_code']}],
  'formatted_address': 'Amherst, MA 01002, USA',
  'geometry': {'location': {'lat': 42.3709104, 'lng': -72.5170028},
   'location_type': 'GEOMETRIC_CENTER',
   'viewport': {'northeast': {'lat': 42.3722593802915,
     'lng': -72.5156538197085},
    'southwest': {'lat': 42.3695614197085, 'lng': -72.51835178029151}}},
  'place_id': 'ChIJ

The query result is formatted using JSON, but how do we extract the latitude and longitude values? We take it one step at a time.

In [11]:
results[0]

{'address_components': [{'long_name': 'Amherst',
   'short_name': 'Amherst',
   'types': ['locality', 'political']},
  {'long_name': 'Amherst Center',
   'short_name': 'Amherst Center',
   'types': ['neighborhood', 'political']},
  {'long_name': 'Hampshire County',
   'short_name': 'Hampshire County',
   'types': ['administrative_area_level_2', 'political']},
  {'long_name': 'Massachusetts',
   'short_name': 'MA',
   'types': ['administrative_area_level_1', 'political']},
  {'long_name': 'United States',
   'short_name': 'US',
   'types': ['country', 'political']},
  {'long_name': '01002', 'short_name': '01002', 'types': ['postal_code']}],
 'formatted_address': 'Amherst, MA 01002, USA',
 'geometry': {'location': {'lat': 42.3709104, 'lng': -72.5170028},
  'location_type': 'GEOMETRIC_CENTER',
  'viewport': {'northeast': {'lat': 42.3722593802915,
    'lng': -72.5156538197085},
   'southwest': {'lat': 42.3695614197085, 'lng': -72.51835178029151}}},
 'place_id': 'ChIJjzZMG_jN5okRvCYRnhDnGUo

In [12]:
results[0]['geometry']

{'location': {'lat': 42.3709104, 'lng': -72.5170028},
 'location_type': 'GEOMETRIC_CENTER',
 'viewport': {'northeast': {'lat': 42.3722593802915, 'lng': -72.5156538197085},
  'southwest': {'lat': 42.3695614197085, 'lng': -72.51835178029151}}}

In [13]:
results[0]['geometry']['location']

{'lat': 42.3709104, 'lng': -72.5170028}

In [14]:
results[0]['geometry']['location']['lat']

42.3709104

In [16]:
results[0]['geometry']['location']['lng']

-72.5170028

Success. We create a coordinates tuple using the latitude and longitude values.

In [17]:
# Extract latitude and longitude
latitude = results[0]['geometry']['location']['lat']
longitude = results[0]['geometry']['location']['lng']

# Create coordinates tuple
coordinates = (latitude, longitude)

We iterate over each venue in our list of venue names and query `Google Maps API` for the coordinates. If coordinates are found, we add the venue name and coordinates to our data dictionary. If coordinates are not found, we print an error message.

In [18]:
for venue_name in venue_names:

    try:

        # Query Google Maps API for GPS coordinates
        results = gmaps.geocode(address=venue_name)

        # Extract latitude and longitude
        latitude = results[0]['geometry']['location']['lat']
        longitude = results[0]['geometry']['location']['lng']

        # Create coordinates tuple
        coordinates = (latitude, longitude)

        # Add values to data dictionary
        coordinates_data['Venue Name'].append(venue_name)
        coordinates_data['Coordinates'].append(coordinates)
        coordinates_data['Latitude'].append(latitude)
        coordinates_data['Longitude'].append(longitude)

    except Exception:

        raise Exception("Error finding the GPS coordinates for {}.".format(venue_name))

Once we have iterated over all venues in our list, we create and preview a coordinates DataFrame.

In [20]:
# Create coordinates DataFrame
coordinates_df = pd.DataFrame(coordinates_data)

# Preview coordinates DataFrame
coordinates_df

Unnamed: 0,Venue Name,Coordinates,Latitude,Longitude
0,Amherst College,"(42.3709104, -72.5170028)",42.37091,-72.517003
1,Bates College,"(44.1057216, -70.2021865)",44.105722,-70.202186
2,Bowdoin College,"(43.9076929, -69.9639971)",43.907693,-69.963997
3,Colby College,"(44.5638691, -69.6626362)",44.563869,-69.662636
4,Connecticut College,"(41.3786923, -72.1046019)",41.378692,-72.104602
5,Hamilton College,"(43.0527984, -75.4059719)",43.052798,-75.405972
6,Middlebury College,"(44.0081076, -73.1760413)",44.008108,-73.176041
7,Trinity College,"(41.7478797, -72.6905199)",41.74788,-72.69052
8,Tufts University,"(42.4074843, -71.1190232)",42.407484,-71.119023
9,Wesleyan University,"(41.5566104, -72.65690409999999)",41.55661,-72.656904


All set.


## Map Venue Names to Venue IDs

We ultimately want to use our coordinates DataFrame in correspondence with a DataFrame that identifies the same list of venues with a unique integer ID rather than a string venue name. For this to work, we need the venue identifiers in both DataFrames to align. 

The venue name to venue ID mapping in the other DataFrame assigns consecutive IDs to an alphabetically-sorted (A to Z) list of venue names. We can build a corresponding venue name to ID mapping dictionary.

In [21]:
# Sort list of venue names
venue_names.sort()

# Create dictionary for venue name to venue id mapping
name_to_id_mapping = {venue: idx for idx, venue in enumerate(venue_names)}

# Preview dictionary
name_to_id_mapping

{'Amherst College': 0,
 'Bates College': 1,
 'Bowdoin College': 2,
 'Colby College': 3,
 'Connecticut College': 4,
 'Hamilton College': 5,
 'Middlebury College': 6,
 'Trinity College': 7,
 'Tufts University': 8,
 'Wesleyan University': 9,
 'Williams College': 10}

We then use the `replace` function to apply the mapping of venue names to IDs in our coordinates DataFrame.

In [23]:
# Map venue name to venue id in DataFrame
coordinates_df = coordinates_df.replace({
    'Venue Name': name_to_id_mapping})

# Preview DataFrame
coordinates_df.head()

Unnamed: 0,Venue Name,Coordinates,Latitude,Longitude
0,0,"(42.3709104, -72.5170028)",42.37091,-72.517003
1,1,"(44.1057216, -70.2021865)",44.105722,-70.202186
2,2,"(43.9076929, -69.9639971)",43.907693,-69.963997
3,3,"(44.5638691, -69.6626362)",44.563869,-69.662636
4,4,"(41.3786923, -72.1046019)",41.378692,-72.104602


It looks like our coordinates DataFrame is all set. 

As a final step, we save our DataFrame to CSV so we can access it elsewhere.

In [25]:
# Save coordinates DataFrame to CSV
coordinates_df.to_csv('data/venues_coordinates.csv')

Done!