# Data Analysis: Google Places API

Government agencies <a href="https://www.data.gov/developers/government-apis/">have APIs</a>, as well as many popular companies like <a href="https://developers.google.com/places/">Google</a> and <a href="https://www.yelp.com/developers/documentation/v3">Yelp</a>. When available, APIs are a good resource for pulling data programmatically.

The steps to using any API can be summarized as follows:
<ol>
<li>Get an API key</li>
<li>Build a URL/query (see API documentation)</li>
<li>Submit the API request</li>
<li>Parse the response</li>
<li>Store, export, etc. for analysis</li>
</ol>

From my experience, Google has one of the most intuitive and well documented APIs, offering a good introduction to making API calls. In this post, we will use the Google Places API Web Service to place a "text search request", an explanation of which can be found <a href="https://developers.google.com/places/web-service/search#TextSearchRequests">here</a>. Using this service, we will submit a request for a business and get back the recommended businesses and their locations (very similar to a common Google search). We will then compile the suggested businesses from Google and a portion of its address information in a Pandas data frame.

### Import Libraries

I will use the <b>Requests</b> library to make the API requests--an intuitive library for working with APIs--and use the <b>Pandas</b> library for the data frame.

In [1]:
import requests
import pandas as pd

### Build the Request URL

The documentation for sending a text search request states that the URL must take the following form:
```python
https://maps.googleapis.com/maps/api/place/textsearch/output?parameters
```

Where ```output``` is how we would like the data returned and ```parameters``` will contain details about what we are searching for, as well as our API key. We will only pass the two required parameters shown on the documentation page, ```key``` and ```query```.

#### Fixed Components
Most of the URL wont change, it is fixed across repeated API calls. That includes the URL up to ```output```, as well as the output argument and the API key parameter. The base of the URL will be stored in <b>base_url</b>, the API key in <b>api_key</b> and the output argument in <b>output</b>. For our search, we will request the output in JSON format.

*Note that the value for <b>api_key</b> is a <i>fake</i> API key and is not the key I used to perform the request (that would be irresponsible to share with the public).

In [2]:
base_url = 'https://maps.googleapis.com/maps/api/place/textsearch/'
api_key = 'API_KEY'
output = 'json'

#### Query Parameter
The ```query``` parameter will tell Google what to look for. I am going to define a search with three pieces of information, (1) the store name, (2) the city, and (3) the state. These will be stored in the <b>store_name</b>, <b>store_city</b>, and <b>store_state</b> variables and concatenated together as <b>store_search</b>. As is standard for URLs, blank spaces will be replaced with a plus sign.

Printing <b>store_search</b> shows the final assignment of the ```query``` parameter.

In [3]:
store_name = 'Walmart'
store_city = 'Washington'
store_state = 'DC'

store_search = (store_name + ' ' + store_city + ', ' + store_state).replace(' ','+')
print(store_search)

Walmart+Washington,+DC


#### Complete URL

Putting the fixed components together with the query parameter completes the URL. Concatenate all of the parts of the URL together and store in <b>complete_url</b>. Printing <b>complete_url</b> shows the URL we will pass to Google (replacing my API key of course).

In [4]:
complete_url = base_url + output + '?query=' + store_search + '&key=' + api_key
print(complete_url.replace(api_key, 'API_KEY'))

https://maps.googleapis.com/maps/api/place/textsearch/json?query=Walmart+Washington,+DC&key=API_KEY


### 2. Submit the API request

We will use the <b>request</b> library to submit our request to Google's servers, which will return a "response" object. The <b>request</b> library has a module <b>text</b> that makes printing the response object readable for instructive purposes. Notice that the response looks almost identical to the <a href="https://developers.google.com/places/web-service/search#PlaceSearchResponses">response example</a> in the documentation. Feel free to browse the results!

In [5]:
response = requests.get(complete_url)
print(response.text)

{
   "html_attributions" : [],
   "results" : [
      {
         "formatted_address" : "99 H St NW, Washington, DC 20001, United States",
         "geometry" : {
            "location" : {
               "lat" : 38.9004177,
               "lng" : -77.01193379999999
            },
            "viewport" : {
               "northeast" : {
                  "lat" : 38.9016865802915,
                  "lng" : -77.01061981970851
               },
               "southwest" : {
                  "lat" : 38.8989886197085,
                  "lng" : -77.01331778029152
               }
            }
         },
         "icon" : "https://maps.gstatic.com/mapfiles/place_api/icons/shopping-71.png",
         "id" : "db2cc938bca138dd9c479d2ec0f3abd57a9de235",
         "name" : "Walmart Supercenter",
         "opening_hours" : {
            "open_now" : true,
            "weekday_text" : []
         },
         "photos" : [
            {
               "height" : 2988,
               "html_attributio

### 3. Convert the Response

In order to begin extracting data from the response object, I will need to convert it to a JSON object. This can be done applying the <b>json()</b> module on the response object. Printing the type of the converted response object <b>response_json</b> confirms that we have a workable dictionary!

For example, according to the Google Places API documentation, the response has an element called "results", which itself is a list that contains information on each of the stores returned in the response. I picked Walmart because I know there are exactly three in DC, which we can confirm by counting the total number of elements in the "results" list.

In [6]:
response_json = response.json()
print(type(response_json))
print(len(response_json['results']))

<class 'dict'>
3


### 4. Compile Data into Data Frame

I want to compile each store name, city and state from the results into a data frame. For instructive purposes, let's start by printing the name of the business and the formatted address for all three Walmart's Google found in DC. As we can see, the "formatted_address" includes the street number, street name, city, state, zip code, and country all in one field. We will need to separate the field into its components.

In [7]:
for business in response_json['results']:
    print(business['name'], business['formatted_address'])

Walmart Supercenter 99 H St NW, Washington, DC 20001, United States
Walmart 310 Riggs Rd NE, Washington, DC 20011, United States
Walmart Supercenter 5929 Georgia Ave NW, Washington, DC 20011, United States


Let's put the results into a Pandas data frame called <b>business_df</b>.

In [8]:
business_df = pd.DataFrame(data=response_json['results'])
business_df.head()

Unnamed: 0,formatted_address,geometry,icon,id,name,opening_hours,photos,place_id,price_level,rating,reference,types
0,"99 H St NW, Washington, DC 20001, United States","{'location': {'lng': -77.0119338, 'lat': 38.90...",https://maps.gstatic.com/mapfiles/place_api/ic...,db2cc938bca138dd9c479d2ec0f3abd57a9de235,Walmart Supercenter,"{'weekday_text': [], 'open_now': True}","[{'html_attributions': ['<a href=""https://maps...",ChIJlRmAQ4q3t4kRDhaR4yfucJg,1,3.7,CmRSAAAAq3iHBK5uw5CgzbFxm_lg-WbXtBrFw4B704oAUm...,"[department_store, supermarket, electronics_st..."
1,"310 Riggs Rd NE, Washington, DC 20011, United ...","{'location': {'lng': -77.00244690000001, 'lat'...",https://maps.gstatic.com/mapfiles/place_api/ic...,c3272c8822868df32cf2fd4d3ed2351ab76b5c2d,Walmart,"{'weekday_text': [], 'open_now': True}","[{'html_attributions': ['<a href=""https://maps...",ChIJywIxCdHHt4kRDNBJeIZ1Jbc,1,4.1,CmRSAAAAXQqZbMDKd5n-eJpfQEoKJRlPPVT9ELm1nGiRj-...,"[department_store, supermarket, grocery_or_sup..."
2,"5929 Georgia Ave NW, Washington, DC 20011, Uni...","{'location': {'lng': -77.0273886, 'lat': 38.96...",https://maps.gstatic.com/mapfiles/place_api/ic...,4e7f7c568c2d8f017e20323c64159dd4ab30fa69,Walmart Supercenter,"{'weekday_text': [], 'open_now': True}","[{'html_attributions': ['<a href=""https://maps...",ChIJy1391WPIt4kRqpZ2j_HGM38,1,4.0,CmRRAAAAx8FKtAQrhhTYr1oEcI-Nt4prUyvnkpPVtkd6oM...,"[department_store, supermarket, electronics_st..."


Let's drop all the variables I have deemed unnecessary, leaving us with only the name of the business and the formatted_address. Drop variables using the <b>drop</b> function.

In [9]:
business_df = business_df.drop(['geometry','icon','id','opening_hours','photos',
                                'place_id','price_level','rating','types','reference'], axis=1)
business_df.head()

Unnamed: 0,formatted_address,name
0,"99 H St NW, Washington, DC 20001, United States",Walmart Supercenter
1,"310 Riggs Rd NE, Washington, DC 20011, United ...",Walmart
2,"5929 Georgia Ave NW, Washington, DC 20011, Uni...",Walmart Supercenter


My first impression is to split <b>formatted_address</b> at each comma using the <b>str.split()</b> function in the <b>Pandas</b> library. Let's store the split address as <b>formatted_address_delim</b>.

In [10]:
business_df['formatted_address_delim'] = business_df['formatted_address'].str.split(pat=',')
business_df.head()

Unnamed: 0,formatted_address,name,formatted_address_delim
0,"99 H St NW, Washington, DC 20001, United States",Walmart Supercenter,"[99 H St NW, Washington, DC 20001, United S..."
1,"310 Riggs Rd NE, Washington, DC 20011, United ...",Walmart,"[310 Riggs Rd NE, Washington, DC 20011, Uni..."
2,"5929 Georgia Ave NW, Washington, DC 20011, Uni...",Walmart Supercenter,"[5929 Georgia Ave NW, Washington, DC 20011, ..."


Notice that each value of <b>formatted_address_delim</b> is a list with four elements. The first element is the street information, the second element is the city, but the third element is a combination of the state and zip code. In order to capture the state only, we will need to take a substring of the third element of <b>formatted_address_delim</b>

In [11]:
print(business_df['formatted_address_delim'][0])

['99 H St NW', ' Washington', ' DC 20001', ' United States']


Create three new variables, <b>street</b>, <b>city</b>, and <b>state</b>. Using the <b>iterrows()</b> function to loop over each row, we will set the value of the variables <b>street</b>, <b>city</b>, and <b>state</b> based on their location in the variable <b>formatted_address_delim</b>.

In [12]:
business_df['street'] = ""
business_df['city'] = ""
business_df['state'] = ""

for index, address in business_df.iterrows():
    business_df['street'][index] = address['formatted_address_delim'][0]
    business_df['city'][index] = address['formatted_address_delim'][1]
    business_df['state'][index] = address['formatted_address_delim'][2][1:3]

business_df.head()

Unnamed: 0,formatted_address,name,formatted_address_delim,street,city,state
0,"99 H St NW, Washington, DC 20001, United States",Walmart Supercenter,"[99 H St NW, Washington, DC 20001, United S...",99 H St NW,Washington,DC
1,"310 Riggs Rd NE, Washington, DC 20011, United ...",Walmart,"[310 Riggs Rd NE, Washington, DC 20011, Uni...",310 Riggs Rd NE,Washington,DC
2,"5929 Georgia Ave NW, Washington, DC 20011, Uni...",Walmart Supercenter,"[5929 Georgia Ave NW, Washington, DC 20011, ...",5929 Georgia Ave NW,Washington,DC


Now we clean it up by dropping the <b>formatted_address</b> and <b>formatted_address_delim</b> variables.

In [13]:
business_df = business_df.drop(['formatted_address', 'formatted_address_delim'], axis = 1)
business_df.head()

Unnamed: 0,name,street,city,state
0,Walmart Supercenter,99 H St NW,Washington,DC
1,Walmart,310 Riggs Rd NE,Washington,DC
2,Walmart Supercenter,5929 Georgia Ave NW,Washington,DC
