# Google Places API Data Notebook:

 This notebook contains code that can allow for the user to collect data for a number of establishments within the San Diego County area (or with a slight modification, within the area of any other location, as well) and assigns to each establishment from the area the FEMA lifeline it is associated with. It takes advantage of data on the web accessible from the Google Places API, and after the user is granted access to the API by obtaining and enabling an API key through Google and then putting it in the code, it will make requests from the API using different types of searches, in particular, Nearby Search and Text Search. The reason these types of place searches were chosen was so we could include in the URL for each request a term or type of place that we wanted establishments for (from which we had a list to search through, for each FEMA lifeline and its components), and we could also use geographic coordinates of latitude and longitude for the location, and a radius for the amount of geographic area at the location (in meters) for the search to use. The code has a function that is called individually for each lifeline, and the notebook code then takes all of the results for each lifeline with the lifeline number (specified 1-7, in accordance with the FEMA Community Lifeline Components, see page 8 of source: - https://www.fema.gov/media-library-data/1550596598262-99b1671f270c18c934294a449bcca3ce/Tab1b.CommunityLifelinesResponseToolkit_508.pdf) and then puts them together into a single dataset. While the function is running, to let the user know what's going on, it prints out the terms for each lifeline component currently being searched for and whether or not it was able to access a page with content on places for them, as well as additional pages for the search being accessed by also printing out the next page token that directed the URL to that next page to then be accessed. Before exporting the final dataset to a .csv file, it tries to remove any duplicate establishments that may have come up from all of the different the searches.

In [1]:
import pandas as pd
import time
import requests
import json
from pandas.io.json import json_normalize

# Function for retrieving data per Lifeline:

In [2]:
# after enabling your own Google Places API key, must include it here:
my_api_key = ''

In [5]:
# string used as main part of url for Nearby Search Google Places API query: (can change this to include coordinates 
# and radius <= 50000 of your own choosing)
base_url = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=32.715736,-117.161087&radius=50000'

In [6]:
# strings here are parts of url for doing the Text Search Google Places API queries:
base_url_2 = 'https://maps.googleapis.com/maps/api/place/textsearch/json?query='

base_url_2_loc = '+in+San+Diego&location=32.715736,-117.161087&radius=50000'

In [7]:
# defining headers for url request:
headers={'content-type':'application/json'}

In [8]:
next_page = ''

In [9]:
def get_life_line_dataframe(list_of_places, list_of_search_terms, lifelinenum):
    
    # dataframe for the rows for establishments we get from the Nearby Search queries
    df_places = pd.DataFrame()

    # dataframe for the rows for establishments we get from the Text Search queries
    df_terms = pd.DataFrame()
    
    # checks for if there is any content in specified list for the Nearby Search, and if so, proceeds
    if (list_of_places):
        # for-loop iterates through each of the items in list of places:
        for place in list_of_places:
            # current indicator for next page is first set to blank, until next page information (if any) is added
            next_page = ''
            # nested for-loop makes requests for more pages for the current query
            # chose a maximum number of pages to go through
            for x in range(1, 10):
                # request consists of URL containing part with coordinates, then attaches the place type, next page token, and the api key
                res = requests.get(url=base_url+'&type='+place+next_page+'&key='+my_api_key, headers=headers)
                # prints out current search word/place type with status code for page requested to notify user
                print(f'{place} : {x} : {res.status_code}')
                # sleep time for giving the server some room
                time.sleep(5)
                # json information from request is stored in variable data
                data = res.json()
                # checks if there was a valid status code, and there was access with OK status, and that there were actually any results in
                # in the page that was reached
                if ((res.status_code == 200) and (data['status']=='OK') and (data['results'])):
                    # prints only after these conditions are met to tell user that we basically accessed a page with results
                    print(f'{place} : {x} : valid page')
                    # json results data is stored as a dictionary and then put into a dataframe
                    df = pd.DataFrame.from_dict(json_normalize(data['results']), orient='columns')
                    # code lines with for-loop below change values in types columns to be just strings with comma separations
                    # so as to not be dictionaries, and for us to still have types information 
                    category_list = df['types']
                    new_list = []
                    for j in category_list:
                        string = ', '.join(i for i in j)
                        new_list.append(string.lower())
                    
                    # new dataframe gets created with same values as original
                    extra_df = df
                    # types column for this dataframe get changed to strings version
                    extra_df['types'] = new_list
                    # this new dataframe is what gets appended to dataframe for Nearby Search
                    df_places = df_places.append(extra_df, ignore_index=True)

                    # try-except: 
                    # handles exception for when it checks for a next page token and can't find one:
                    # but if it can, it sets it to next page token variable, else it gets reset to blank and we break out of the current
                    # loop iteration
                    try:
                        if (data['next_page_token'] is not None):
                            next_page = '&pagetoken='+data['next_page_token']
                            print('next page token:'+data['next_page_token'])
                        else:
                            next_page = ''
                            break
                    except:
                        next_page = ''
                        break
    
    # checks for if there is any content in specified list for the Text Search, and if so, proceeds
    # process here follows for Text Search in very similar manner to above for Nearby Search 
    if (list_of_search_terms):
        for term in list_of_search_terms:
            next_page = ''
            for x in range(1, 10):
                # spaces in search term are replaced with plus signs so as to be contained in URL
                term_string = term.replace(' ', '+')
                # here this similar to before only term is placed between both parts (the latter with the geographic info), 
                # then the rest is added on
                res = requests.get(url=base_url_2+term_string+base_url_2_loc+next_page+'&key='+my_api_key, headers=headers)

                print(f'{term} : {x} : {res.status_code}')
                time.sleep(5)
                data = res.json()
                if ((res.status_code == 200) and (data['status']=='OK') and (data['results'])):
                    print(f'{term} : {x} : valid page')

                    df = pd.DataFrame.from_dict(json_normalize(data['results']), orient='columns')

                    category_list = df['types']
                    new_list = []
                    for j in category_list:
                        string = ', '.join(i for i in j)
                        new_list.append(string.lower())

                    extra_df = df    
                    extra_df['types'] = new_list

                    df_terms = df_terms.append(extra_df, ignore_index=True)

                    # try-except:
                    try:
                        if (data['next_page_token'] is not None):
                            next_page = '&pagetoken='+data['next_page_token']
                            print('next page token:'+data['next_page_token'])
                        else:
                            next_page = ''
                            break
                    except:
                        next_page = ''
                        break


    # merges dataframes storing rows from Nearby Search and Text Search into one dataframe
    df_places = pd.concat([df_places, df_terms], axis=0, ignore_index=True)

    # removing rows that are entirely empty
    df_places.dropna(axis=0, how='all', inplace=True)
    # removing photos column since we aren't looking for photos at this time, and this avoids issues later
    df_places.drop(columns='photos', inplace=True)
    # removing any duplicated rows
    df_places.drop_duplicates(inplace=True)
    # lifeline column will be a column of just this value
    df_places['lifeline'] = lifelinenum
    # function will then return this dataframe
    return df_places

In [10]:
# -----------------------------------------------------

# Getting DataFrames for each Lifeline:

# Lifeline 1:

In [11]:
# lifeline 1: (Safety & Security)

# lifeline 1 - Nearby Search query place types:
list_of_places_1 = ['local_government_office','fire_station', 'police', 'city_hall', 'post_office']

# lifeline 1 - Text Search query terms:
list_of_search_terms_1 = ['police department', 'police organization', 'emergency management', 'fire department', 'government services', 
                          'embassy', 'law enforcement']

In [12]:
# calling function for lifeline 1:
df_1 = get_life_line_dataframe(list_of_places_1, list_of_search_terms_1, 1)

local_government_office : 1 : 200
local_government_office : 1 : valid page
next page token:CrQCLgEAACp0fq72C3sQL7bUSnbGPmD7GUZVkissz3k0DwsCy-pg8OEiy82_tx4xmAtp_ZCKNCRHL2zdSyAG5zU4VGpd9K9sfnTUwAObRMqLW2xJ-FdwbxkCC6hbZoO6gFwMbkECyuMxBjcoNzuQvz93iLCRmFwlVSTs66yyXVFMsIGhkbyT4YApam6H-XuZLaQvLvDM5gLhyTXfee4lAxb2CWZkNYTj5Xe4Qwnty8cQA4y_Cuti5BhVrLfLsGepcTlLvbHbVMcnWhgP8Mit89KQ10QVwNvKyncBXPAcMueW1rYVMHfuZcw08PQWISg_tqn6ilFmZmmCLa5-ym51uhkN_Q4Aa__Otn3hG6pdSW11BNwO7J_lNH-P7-V0_jCdGSGsl8cV7otvhdS1gWsc36hOUfW1DcQSEJjfNVJ8hV6TXK6BhHB-xCQaFOwfx6JfcKSLOBC1o3XEQPZQ-3oD
local_government_office : 2 : 200
local_government_office : 2 : valid page
next page token:CtQDzgEAAEk_oSc2Xeuzxp17qpcDRMEoTzxlOgHdAyo0VgsQY_hOEDstFqYkw2D3JE-C6aDY0tFDXnI_1RH0OD_CaAhRfQ5IO4fheyr4ae4LiazjYuMpRCkfEHlOlP30fwT76qYMhBLLmA6L5GWFF7U-IyLzpMOZVhWfMdtPVvF5lPB63puqXCUufGyPNl7-0_X8aOSl49pcpvFxL7l7X_SUdokk15A5vreRBLrjV6z7bns_VmbA3yI3MogjNae--8FggP_l0VX3o-IcNIf37FptJD9oNUwjs0ft4qvN0p8N_neZyt4-KNNk4RYWBnwE2eY0DDU5usUXorIpnYoLbs1aydxmY

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort,


local_government_office : 3 : 200
local_government_office : 3 : valid page
fire_station : 1 : 200
fire_station : 1 : valid page
next page token:CrQCIwEAABw7KSE7pEmj_TBidQZkPF2P6Uqi5khJyS7ODf816GPydY-dZ9tezYYpoypEtQM9U3gIt5Ln_rLvANsR-bF-BxoxnquGYCBsDdKfzDqepEQ4eF3Gp_gzsRQuRUE8nkqXulcsDR_aNbCkQAPWPKVOcsgN7bk4ytKY-_f4EFKzsJ92f8foi-0UO8R22WPqm5RmXq-ZAu_JaG9aBxHJHKCRRfGcHT2u1fvan5khnr_y2Mu7sGZn9JieKp1JkFuvQwRzkDm3iQODfigSLVbBhvp9gHBtpXvMc41P4YlD48j3VfVwn3-jQDsW4qpK1VUMbEDr0h39Vcb75DjdX469tXI4me_iNhG7GXyNvM4s76pnTPSxCnMtguyQWaUmhEvCLbeLV0ewv3m5fPtMJ8obb2oCrYwSEDa-PxSxa1SlMEjdMEB1P0oaFNyE8IpAeGvmLb7KRtty-Y0x-Ofh
fire_station : 2 : 200
fire_station : 2 : valid page
next page token:CtQDwwEAAJ3kVy12-XhYgipWooUWXhHSCUjAG1_mOH4nb0LaNZ2-R_nIYM3W8IN14vsEoLNc108-VZEwHN-U-nx7YaNElEkc-1qXd33-OW6bmryPSZqAw8VFGkOnwFwYRyhAFy9bVj19fTBygUxPdS0Kx1IkDyrVO-8eDFNkMC16sF4bKCIUwX214ZvBHtllFSBc4KtKpx5w6hbaJc6VPq8WzGtoyRljlH27jIEh6sOiY_J5jWMvnP5dMpsOHafchg7HhKfmqUxgp2ddnR7bgIrgadTjZdEn4q6GR7vmUHD8zuJfQBF9REJ2KOLS7V

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




In [13]:
# shape of lifeline 1 dataframe after retrieval:
df_1.shape

(393, 22)

# Lifeline 2:

In [14]:
# lifeline 2: (Food, Water, and Sheltering)

# lifeline 2 - Nearby Search query place types:
list_of_places_2 = ['restaurant', 'cafe', 'supermarket', 'grocery_or_supermarket', 'store', 'grocery', 'lodging', 'stadium', 'liquor_store']

# lifeline 2 - Text Search query terms:
list_of_search_terms_2 = ['water purification', 'food delivery service', 'foodbank', 'water delivery', 'water store','water supplier',
                    'agriculture', 'homeless shelter', 'store', 'grocery', 'cafe', 'supermarket', 'restaurant', 'animal shelter', 
                    'community center', 'lodging', 'stadium', 'arena', 'church', 'museum', 'library', 'school', 'university', 'food']

In [15]:
# calling function for lifeline 2:
df_2 = get_life_line_dataframe(list_of_places_2, list_of_search_terms_2, 2)

restaurant : 1 : 200
restaurant : 1 : valid page
next page token:CrQCIQEAALzUIemEzO-HPBBMbI_XaVT-TWcDglIaLa7xPhOsUbhvPg3171uUZQAjNDCI_X3xGIvZvzfxXxCN2VJTZyg2VhEmLFea3zIEtxn2xgtAb0tdJQyCaqbX7btfYHz7jep6UyT4Rra1VYPmSR4n7SQAMzOgeS7CJtE6Fb0d2tBMeV4Bp1g4rfSL6J0xNr1h-fUfZv9G0jleKKaY7qm0MV2gEhZXgMU-3dapNQJSqiv9JuEzPDPoYXLpYN9DEjXrbG0Aakgewaabt5Q5cm1uhixbuHUP0uQGi3x3_GZHrfDZpo-qxQvF8sckeyBPmoKYCc_uMs323BwkKvSIHNMSxdt0EHyCKLYs6SmdP0BTiiMCAi-2cAi6qdM1SxGLC7cMIWfBE60vMILKMwnMhV_nT8SJ3AcSEGDveanGv2TluIQEL-sAiE8aFLyUhpxg0glt8zTLpx8ZKGMNyvdL
restaurant : 2 : 200
restaurant : 2 : valid page
next page token:CtQDwQEAANM4tbOcvX8ihNmxZM2Rz4SjVhDZ1QDoz0BF_5xz58OlptxeZEl97NRzuoVjWrwqqc2W4jz_6cLYM8dGWFCi8fp4hDJsC_okF9x-lyd4vumH0tMzCWlcctRzWBJQVm34DJrzMnhCD1rvzixTRriNAJ_U_PSI7XSKPfYVynGXfqv8hHDF_Rg2ioT23BepGaGVjDixSd0LinU9ZqfOi8O_A-3e7v8JzxA55tTqqf3LvVoezRMN3RpaDrHy1xsG6qIZjzk8Wwl_R7Wk9h6Q2OvEb2n90ywZ745hYNWqLbc4PrPD-iPKN6a5H35qi0v_9xAjpkLrMEOcuwZDlPTbdpzg4umx1tEVLBjKKoa9Tf5NftMdxTKAwn8_Wpng7TdN5kv5OgasXKOyf

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




In [16]:
# shape of lifeline 2 dataframe after retrieval:
df_2.shape

(1340, 22)

# Lifeline 3:

In [17]:
# lifeline 3: (Health and Medical)

# lifeline 3 - Nearby Search query place types:
list_of_places_3 = ['doctor', 'hospital', 'veterinary_care','medical_care', 'patient_move', 'urgent_care']

# lifeline 3 - Text Search query terms:
list_of_search_terms_3 = ['health', 'counciling and mental health', 'medcenters', 'hospitals', 'emergency rooms',
                    'cremation services', 'dentist']

In [18]:
# calling function for lifeline 3:
df_3 = get_life_line_dataframe(list_of_places_3, list_of_search_terms_3, 3)

doctor : 1 : 200
doctor : 1 : valid page
next page token:CqQCHQEAAK2rRvXuCvS9GwYU1RiQ8VdnbUo5Rtd74SWrdrqPq1nQk9jlPl5C15AQrReWFuVu8aDbe5p8jNdvAVXpHbIg8Zx7_HTganxTkFfRzLlM_yFiHiv2iKcTgxzxQzNrvnv7jsJGFjv2q8L5E0nVJIyRbTutqwty54TmlmjqRtY0W24s9Y4pI5QzNL0l8OEXKq2d-YM6cDOpJp5d93vppacHz5dbmFI0f_Xe60ZWsQmFdvGHy-AHL_Wsyl6A0BeSqOPEzMZkbGC8i3biIsoL0GimorcA0z-7IO1ODtMUyFsm_XDh5HFd8r6Ax7TfwQfldWQDN9ZOekc21KVIyjwq5a0WZbOfvofU_lr7CHH9g56H-OomQzXYv-M9GOEinKbYpEY6JBIQ9S9KZPXPUV0j3-8G35Vt2xoUnwpGGpZkjI3sN0Ffk0vh_w4FHlw
doctor : 2 : 200
doctor : 2 : valid page
next page token:CsQDvQEAALvrhd9u6RUz4xSMa0lHlRVuTihP9nRaqaWmt8Ju6_pzgJ52AafenAxClBLDAwjTLrTLbNgUR6OKFXFMNdtF9xW8NvEVLnVflwgQ0lm4SpPjtDiVabImMtjHavw8z5acJmhDldzf0UZmd7EaibhLaJLPdfImqHPkgBkM2dRht0FyVmBy6DZSXRVQlj_B-jk-NdfC62gY1qlLi_bDQ4YSN13oECFeLWazpZLPeyeZXN0rkeCjYh_zEffIRPbXLBPp0nggD0pAPwlcDwB4yBdQNhzDb6k3RZ2gRqRU49fVzQOu8pjC3aRwaRoiCLpcqgRx3cM1QNuCtWeiTbyvO_-IQ8zzQY4c5pFqd1fcS1IyrcbtO_zFWqdnNls1jjupdJll7L4J7hYzi0IRTwoFUXBM54nSydRXvKsqhkBHVf3Nn9cf3r

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




In [19]:
# shape of lifeline 3 dataframe after retrieval:
df_3.shape

(521, 22)

# Lifeline 4:

In [20]:
# lifeline 4: (Energy)

# lifeline 4 - Nearby Search query place types:
list_of_places_4 = ['gas_station']

# lifeline 4 - Text Search query terms:
list_of_search_terms_4 = ['fuel','Fuel supplier', 'pipeline', 'gas company', 'Electrical equipment supplier', 'Electrical substation', 
                    'power plant', 'nuclear power plant', 'electric company']

In [21]:
# calling function for lifeline 4:
df_4 = get_life_line_dataframe(list_of_places_4, list_of_search_terms_4, 4)

gas_station : 1 : 200
gas_station : 1 : valid page
next page token:CrQCIgEAAAKccVqNg1cSo19JAVls_WbvsVF6jQGiVqbR0sTlxsfOErcE0PKUKB_2S0jNm9f8U-Uu8mLldpWF1TmOAiXHMWjXOGxphtsllgpTUuoUn6MZZTpuosyo9arEsJe1c52yjQdv5x37nQve2Uo3HRUzGmKGK3w-Wj1yVHlXpkwVFjo8wIzIqu7NSCUr4vm2cbkku_D7EzHilr3GlOAQ1EL806G2cOp-41tkc4KvWqZ3nWbdkqiPAjyvZu9KRLx1IPzxbWXzKpQkVDDYZMjQ35gm60Oa6gg8vWIH_0OCid_kdkPmmlihyhUKQwqJcEPJdIpB3A7dczmZcSBqqlMrBCGx4C3Vuq2FjovXMfuwbT79sxlnx_yhQl9Op00yv6OCHzBaR5LoEGrdN8sS0TI-UEmxU7ASEF8aXDlcIVlMwbq3M4B86VMaFCvvAbQ55FRx34-R8XoDzDzeCcIv
gas_station : 2 : 200
gas_station : 2 : valid page
next page token:CtQDwgEAAPYJLYU-O1qkKXlQirl0s8pDbQSQhUuo2StCHPs-0ZGZ8Q9bnV-dAljbDuyhy3N1CoFBqKzpjI_TmtWf-mMdNgMlThhzeta0aGX3bwvBkCkMPP_bOjf6_tWt1ujgZFnLzusp75zPdZoo_xWUOwPwuQjNWxK_mCYp-g8U6Zyn3gkUc-L6Tiy8OeExqLukx_TvtzBWM5tibnAJZdpbrpEFjNYq-7qdyOGbvX_yao6tsJfRMInuyA_fHNkk2qBW0GB90nENgL0Dm0dJXZojhSSFQmz8hXnsQPkji3ZoVVbctp3fJusce3-L-r0F-gMi8g0yIs0qQspZKh_b9nAcTVlLTjfQ_tveEt4qoc001kCdsAiZf6fu9gTl1J2fAaKt1aJPxyUdO

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




In [22]:
# shape of lifeline 4 dataframe after retrieval:
df_4.shape

(418, 22)

# Lifeline 5:

In [23]:
# lifeline 5: (Communications)

# lifeline 5 - Nearby Search query place types:
list_of_places_5 = ['bank', 'atm', 'comm_infrastructure', 'dispatch_911', 'alerts_warnings_messages']

# lifeline 5 - Text Search query terms:
list_of_search_terms_5 = ['cell towers', 'radio stations', 'television stations', 'banks', 'finance']

In [24]:
# calling function for lifeline 5:
df_5 = get_life_line_dataframe(list_of_places_5, list_of_search_terms_5, 5)

bank : 1 : 200
bank : 1 : valid page
next page token:CqQCGwEAAMJ-Ric3xG5ReVmgV_Ps0a-ZF51Yzhg6j7oPuQQwzkBoj0qjQWHjngD_BwnilUe9ykaJCZHlTE3ofgfm11Wa9KkkN78bEojbdopthGv7Q9xAG1aI3BVVTd6BgPwDhgon82haDjZkcOXbtDzHga-RFK-OYToDuSEn379AX0eKTzhrihU27BaemRDkIX-t2q0uGNatWamup9GDPI-tqFu6tQHc3h_nqMJqsDg462-mp9BJF-r5avOuHViUp0BfP-vm9DCW0MtjJJwqHSHLyq9-dHLH5KMjJlYrxQRbmrpks5BVELbrW7f2Vi4lnTrlI-JFe4ijX8fWYBMcIJa8FZfeho-SCea2dF392_kb1tz1Vk7UdF-WLDNTp7crx9vYaFtrwBIQKF59GEtG_hS8BpDOUNhO6hoUq-eX0thc8yEs4MdgPKq6k0W2bcM
bank : 2 : 200
bank : 2 : valid page
next page token:CsQDuwEAABnxP8vuQTqjjXSo-sQF0NGCPyg95Xr6VG_lG8LZB7rEWPDjlk3-JlyyJ_lDTvJP13XuI61dX809InN7uCsJ-q6UIh4F2FQ0JJR-tptw_8I2ZqYkTJZ25Qe50T5tA7_PlJ6-5-zy3df2yqfWpZMAbIzKEH1JZWW2iRcwn--4vrC0K7cdJPreugl5ppsW-Uh5MNi-Y8mzKdGXLj6kw3tCSy5loOuTIOivFnjAhk1N72Hp7hLFerBX_OZCZxOBrMjidefIpxJekiFc5DoOpHhJG0ABmP8khhlpREdbFLEZ3tSoXxtRR1SNCisEghYAxj_jJpIHvZf8d5dVulPZNLUiQNlY-lVUhR_-iFFalx4-f6o5XMgXFdEvxARMLu8zHd2-Z-CuksFD0gubnGvz-F6ly-zUt8pOOF4z5nfwDQQCP2AjbKKn1MaVgP

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




In [25]:
# shape of lifeline 5 dataframe after retrieval:
df_5.shape

(398, 22)

# Lifeline 6:

In [26]:
# lifeline 6: (Transportation)

# lifeline 6 - Nearby Search query place types:
list_of_places_6 = ['airport', 'bus_station', 'light_rail_station', 'subway_station', 'train_station', 'transit_station', 'car_repair']

# lifeline 6 - Text Search query terms:
list_of_search_terms_6 = ['buses', 'metrostation','public transport','metrostations', 'bus stations','train stations', 
                    'airport', 'port', 'roadside assist', 'towing', 'pipeline']

In [27]:
# calling function for lifeline 6:
df_6 = get_life_line_dataframe(list_of_places_6, list_of_search_terms_6, 6)

airport : 1 : 200
airport : 1 : valid page
next page token:CqQCHgEAANRVSpO42zdvTw4kujj4lhrRFGqaH7xodzrZdz3Fae4uQDdamIlXvU4uKkZDu1DITJno11ZDGmnG27Qypzv_u2uFcrpmGR0rfMPWr-v9j-m8dDM3bECdZ9n6faTID0T9teJ8BBQMgAJcaIu7hZw5RwYotnsZA0FjuFMoaAuzz9yE62m2foUcbNW1kLsPoOeGT7kfSc1Pbj4wk7QhO8htSHHNHT3Ime-Rmiw2dot5lzH5ZKNcivlYeujwi-eDtqn14X10QUDfrD402KpsfiyT0iePGwwHzl1KnkgJDRsrQKaprC4byxk5XT5phRmIf0PEuAzfK5BjpRZqet9EXMDS4etdb5ApOokZN-DQa5lx6QfAkyjJzz7KnTB_TxwT9YOqeRIQUaKdxQFSqQIDq9jWU-BxDxoUl9y0y_3V0-zY6dHqk9yCa_biHto
airport : 2 : 200
airport : 2 : valid page
bus_station : 1 : 200
bus_station : 1 : valid page
light_rail_station : 1 : 200
light_rail_station : 2 : 200
light_rail_station : 3 : 200
light_rail_station : 4 : 200
light_rail_station : 5 : 200
light_rail_station : 6 : 200
light_rail_station : 7 : 200
light_rail_station : 8 : 200
light_rail_station : 9 : 200
subway_station : 1 : 200
subway_station : 2 : 200
subway_station : 3 : 200
subway_station : 4 : 200
subway_station : 5 : 200
subway_statio

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




In [28]:
# shape of lifeline 6 dataframe after retrieval:
df_6.shape

(428, 22)

# Lifeline 7:

In [29]:
# lifeline 7: (Hazardous Materials)

# lifeline 7 - Nearby Search query place types:
list_of_places_7 = []

# lifeline 7 - Text Search query terms:
list_of_search_terms_7 = ['hazardous waste','nuclear power plant', 'toxic waste', 'recycling', 'hazardous waste']

In [30]:
# calling function for lifeline 7:
df_7 = get_life_line_dataframe(list_of_places_7, list_of_search_terms_7, 7)

hazardous waste : 1 : 200
hazardous waste : 1 : valid page
next page token:CsQCPAEAAIO_VqmkqfRhb8w3ti1-9yGp0xlFB1jVe2ljmfVJW_plFt5pK5mZ0H-OopuP98_tTDLbKSdTcypdfQa2rQm0Bi_ckzrxd2dVbwCQUb1STtoToZrVLp4lnARzhsz6h1XUwELz-5svIqqi3M_SNAuyIvxj_CACIKnkjXWvjRco5ChGN9yLjkVTKpSQ_hXIBQYQtj_6Mg_pyHCARehgonzHOydQU4PnJ8X5O00WyDaRdeOTmZwotmcqn9-Oa7cdbnNNLB2M9h5sB6pLenNYZnqjC5lZrqVVqbQqVhRDoztqajFtYyk02Maxi8Nkbw46JM2c8eMdvQTt0O1G2TjXKDAHQ6-cQOICSv5J__xEO7wxivkuuG9ZtKBKN3wMhHLp-p25jZmZvOZEnCiJr9QYohLSaA1FXq-jIBgPVjQIog-guGl1EhD64LW5A5MIkEELtmr7CuWpGhR5eIkyf8hvNG0TAkVww_dCvZ9_7w
hazardous waste : 2 : 200
hazardous waste : 2 : valid page
next page token:CuQD3AEAAIQBX6UCo0clEQXu7Zlx-6BeBfejSJWaYWQVqY3shsPp44Vnj9CYNe6RiNutQrZ7qPROz40D21EHTaND8WEH7lANJ_-wBb_0sLSkMaFVhGPbmxS8EyrcUFHA-KtaW9X0QU1IMTFnXDK5GIcjDcCdpUCECvWRVYdiBHxJB_iuIsx-d7b3rLO9kxV7JbEoTF-X59Spvi9YH8qLjR4_gpOILdZBLOXpgZGJ6zklnLaf5wonvL_9-w2JjeGeQYLML3QfGUeFMy1s-UcgCD76DvXUh3ZtNwx8_vUVBIviGFCz_N5xMx1NKSdjRlIce8TpQKVnE3U8QF4P5B1rxeJm7h-WnrmsIHOOpw6

In [31]:
# shape of lifeline 7 dataframe after retrieval:
df_7.shape

(112, 19)

# Exporting to final dataframe:

In [32]:
# merging all of the dataframes for each lifeline to a single large dataframe
final_dataframe = pd.concat([df_1, df_2, df_3, df_4, df_5, df_6, df_7], axis=0, ignore_index=True)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


In [33]:
# getting size of dataframe after merging all lifeline dataframes:
final_dataframe.shape

(3610, 22)

In [34]:
# removing any rows that are entirely identical to each other with drop_duplicates()
final_dataframe.drop_duplicates(inplace=True)

In [2]:
# checking size after the removal of any identical rows:
final_dataframe.shape

In [36]:
# removing any duplicates based on coordinates:
final_dataframe.drop_duplicates(subset=['geometry.location.lat', 'geometry.location.lng',
       'geometry.viewport.northeast.lat', 'geometry.viewport.northeast.lng',
       'geometry.viewport.southwest.lat', 'geometry.viewport.southwest.lng'], inplace=True)

In [37]:
# checking size of dataframe after this drop duplicates:
final_dataframe.shape

(3307, 22)

In [38]:
# Combining formatted address and vicinity columns:

# imputing any NaNs in both columns with 0s:
final_dataframe['formatted_address'].fillna(0, inplace=True)
final_dataframe['vicinity'].fillna(0, inplace=True)

# replacing long list of 0s in formatted_address column (aligned with where vicinity non-0 values start) with vicinity column's non-0s:
final_dataframe['formatted_address'] = (final_dataframe[~(final_dataframe['vicinity']==0)]['vicinity']).append((final_dataframe[~(final_dataframe['formatted_address']==0)]['formatted_address']))

# after combining formatted_address and vicinity columns into formatted_address, dropping vicinity column:
final_dataframe.drop(columns='vicinity', inplace=True)

In [39]:
# cleaning strings in types column
final_dataframe['types'] = final_dataframe['types'].str.replace(', point_of_interest', '')
final_dataframe['types'] = final_dataframe['types'].str.replace(', establishment', '')
final_dataframe['types'] = final_dataframe['types'].str.replace('point_of_interest', '')

# removing beginning and ending commas of strings in types, using strip()
final_dataframe['types'] = final_dataframe['types'].str.strip(',')

In [40]:
# seeing the current column names of the dataframe:
final_dataframe.columns

Index(['formatted_address', 'geometry.location.lat', 'geometry.location.lng',
       'geometry.viewport.northeast.lat', 'geometry.viewport.northeast.lng',
       'geometry.viewport.southwest.lat', 'geometry.viewport.southwest.lng',
       'icon', 'id', 'lifeline', 'name', 'opening_hours.open_now', 'place_id',
       'plus_code.compound_code', 'plus_code.global_code', 'price_level',
       'rating', 'reference', 'scope', 'types', 'user_ratings_total'],
      dtype='object')

In [41]:
# renaming desired columns to better names:
final_dataframe.rename(columns = {'types': 'Business Category', 
                     'name' : 'Business Name', 
                     'formatted_address' :  'Business Address', 
                     'geometry.location.lat' : 'Latitude',
                     'geometry.location.lng' : 'Longitude',
                     'lifeline' : 'Lifeline'}, inplace=True)

In [42]:
# dropping unnecessary columns:
final_dataframe.drop(columns=['geometry.viewport.northeast.lat',
       'geometry.viewport.northeast.lng', 'geometry.viewport.southwest.lat',
       'geometry.viewport.southwest.lng', 'icon', 'id', 'opening_hours.open_now', 'place_id',
       'plus_code.compound_code', 'plus_code.global_code', 'price_level',
       'rating', 'reference', 'scope', 'user_ratings_total'], inplace=True)

In [43]:
# resetting index to avoid issues with numbering after changing number of rows:
final_dataframe.reset_index(drop=True, inplace=True)

In [44]:
# viewing counts for each lifeline in Lifeline column:
final_dataframe['Lifeline'].value_counts()

2    1316
3     418
4     411
1     391
6     368
5     305
7      98
Name: Lifeline, dtype: int64

In [47]:
# removing any residual DC, VA, and MD locations from the dataset:
list_to_remove = [' VA ' , ' DC ' , ' MD ']

# sets dataframe to opposite of boolean filter where we have dataframe of just he places containing the strings in the list above
final_dataframe = final_dataframe[~(final_dataframe['Business Address'].str.contains('|'.join(list_to_remove)))]

In [1]:
# seeing shape after removal:
final_dataframe.shape

In [48]:
# fixing the index after removal of rows:
final_dataframe.reset_index(drop=True, inplace=True)

In [51]:
# exporting to a .csv file:
final_dataframe.to_csv('./google.csv')