The code below represents the fairly manual process of collecting Google Places data from cities of interest for my project on urban intergroup integration post-conflict and its impact on conflict recovery. I created grids of 400 x 400m (generally) for each city in ArcGIS Pro, then cycle through each grid centroid with the Google Places API to collect all places within a radius that meet my search criteria (religious, cultural, administrative, educational, medical sites, primarily).

The Places API has a limit of 20 sites it can return, so for each city after cycling through all the points initially, I test if any have returned 20+ sites. For these, I rerun using a version of the code that searches for different types separately (`split`), and in the few cases where that still goes over the API return limit, I split every category out separately (`split_all`). I do this instead of just splitting all the categories from the start to minimize the number of API calls I make, as they are not free. In this example code I show the process for a few cities as an example. 

I store the end result as a goeojson for each city and process all of the cities' data together in another script, `calc_clustering_metrics.ipynb`. 

In [None]:
import pandas as pd
import geopandas as gpd
import requests
import json
import numpy as np

In [3]:
import os
os.getcwd()

'/mnt/c/Users/natra/Documents/Research/Identity-Conflict'

## Global Code

In [4]:
def place_extract_point(url,headers,included_types,lat,lon,radius):
    """
    Return the dictionary of places within a radius around the given lat/lon
    """

    json_data = {'includedTypes': included_types,
            'locationRestriction': {
                'circle' : {
                    'center' : {
                        'latitude' : lat,
                        'longitude' :  lon},
                    # radius in meters
                    'radius' : radius
                }}}

    print('posting request')
    response = requests.post(url, headers=headers, json=json_data)
    print('obtained response')
    try:
        places_out = json.loads(response.text)
    except json.decoder.JSONDecodeError:
        print("API returned nothing, JSON Decode Error")
        return None

    return(places_out)



In [5]:
def obtain_places_from_points(url, headers, point_gdf,radius=200,places_func="full"):
    """
    From a dataframe of points, search for nearby places around each point
        and return a dictionary of lists of the places
    places_func: either "main" for all places;
                "split" if need to use 2 separate calls to obtain the full list of 
                    places; 
                "split_all" if need to call each location type separately.
                Use when too many locations in a given point to avoid 
                    hitting the 20-max threshold in Google Places API
    """

    point_dicts = []
    nrow = point_gdf.shape[0]
    included_types = ['art_gallery','museum','performing_arts_theater',
                            'library','school','university',
                            'community_center','cultural_center','historical_landmark',
                            'city_hall','courthouse','local_government_office','police',
                            'hospital',
                            'church','hindu_temple','mosque','synagogue']
    if places_func == "split":
        included_types_re =  ['library','school','university',
                            'church','hindu_temple','mosque','synagogue']
        included_types_cg = ['art_gallery','museum','performing_arts_theater',
                            'community_center','cultural_center','historical_landmark',
                            'city_hall','courthouse','local_government_office','police',
                            'hospital']

    for i, row in point_gdf.iterrows():
        print('obtaining places for point ',i,' / ',nrow)
        
        lon = row['geometry'].x
        lat = row['geometry'].y

        if places_func=="full":
            out_places = place_extract_point(url,headers,included_types,lat,lon,radius)
            if out_places:
                dict_places = out_places['places']
            else:
                dict_places = []
        elif places_func=="split":
            print("Obtaining religious and educational places..")
            out_places_re = place_extract_point(url, headers, included_types_re,lat, lon, radius)
            print("Religious & Edu out_places:", out_places_re)
            print("Obtaining cultural and government places..")
            out_places_cg = place_extract_point(url, headers, included_types_cg,lat, lon, radius)
            print("Culture & Gov out_places:", out_places_cg)
            if out_places_re and out_places_cg:
                # use lengths to confirm not 20 places returned, if so, likely hitting limit of api
                # and need to split further
                print("Number of results returned from re: ",len(out_places_re['places']))
                print("Number of results returned from cg: ",len(out_places_cg['places']))
                dict_places = out_places_re['places'] + out_places_cg['places']
            elif out_places_re:
                print("Number of results returned from re: ",len(out_places_re['places']))
                dict_places = out_places_re['places']
            elif out_places_cg:
                print("Number of results returned from cg: ",len(out_places_cg['places']))
                dict_places = out_places_cg['places']
            else:
                dict_places = []
        elif places_func == "split_all":
            dict_places = []
            print("Obtaining places type by type")
            for pl_type in included_types:
                print("Obtaining for type ",pl_type)
                out_places = place_extract_point(url,headers,[pl_type],lat,lon,radius)
                if out_places:
                    print(f"Number of results returned from type {pl_type}: {len(out_places['places'])}")
                    dict_places = dict_places + out_places['places']
        
        print(dict_places)

        # create dictionary with geometry of point as the key
        point_dict = {row['geometry']: dict_places}

        point_dicts.append(point_dict)

    return(point_dicts)

In [6]:
def clean_maps_api_output(pts_lst):
    """
    Take full list of points from Google Maps API output and convert
        to GeoPandas geodataframe for future use.
    """
    pts_lst_content = [pt for pt in pts_lst if len(list(pt.values())[0]) > 0]
    print("Number of place dictionaries returned from API: ",len(pts_lst_content))
    pts_lst_content[0:1]

    # convert place list to tables
    df_lst = []
    for pt_entry in pts_lst_content:
        df_row = pd.json_normalize(list(pt_entry.values())[0])
        df_row['api_loc'] = list(pt_entry.keys())[0]
        df_lst.append(df_row)
    loc_df = pd.concat(df_lst)

    print("Shape of resulting dataframe: ",loc_df.shape)
    # remove duplicates caused by overlapping cells
    loc_df = loc_df.drop_duplicates(subset=['location.latitude',
                                            'location.longitude',
                                            'id']).reset_index(drop=True)
    print("Shape of dataframe after dropping dups: ",loc_df.shape)

    # rename columns to friendlier names
    loc_df = loc_df.rename(columns={'location.latitude':'latitude',
                                                'location.longitude':'longitude',
                                                'displayName.text':'displayName',
                                                'displayName.languageCode':'nameLang',
                                                'primaryTypeDisplayName.text':'primaryTypeDisplayName',
                                                'primaryTypeDisplayName.languageCode':
                                                'primaryTypeDisplayLang'})

    # turn the api location point column into lat/lon to be able to store final file with single geom
    loc_df = gpd.GeoDataFrame(loc_df, geometry='api_loc')
    loc_df['api_lat'] = loc_df['api_loc'].y
    loc_df['api_lon'] = loc_df['api_loc'].x
    loc_df.drop(columns=['api_loc'], inplace=True)

    # convert to gdf with actual loc column as geometry
    loc_gdf = gpd.GeoDataFrame(loc_df, 
                                        geometry=gpd.points_from_xy(loc_df.longitude
                                                                    ,loc_df.latitude),
                                        crs='epsg:4326')
    # convert list to str for storage
    loc_gdf['types'] = loc_gdf['types'].astype('str')
    
    return loc_gdf

In [None]:
API_KEY = ""   # removed for privacy


url = f"https://places.googleapis.com/v1/places:searchNearby"
headers = {
    'Content-Type': 'application/json',
    'X-Goog-Api-Key': API_KEY,
    'X-Goog-FieldMask': 'places.displayName,places.formattedAddress,places.businessStatus,places.id,places.location,places.primaryType,places.primaryTypeDisplayName,places.shortFormattedAddress,places.subDestinations,places.types,',
}


## Croatia

### Osijek

In [7]:
# created grid in as 400 x 400 w x h grids with Fishnet tool in ArcGIS Pro
grid = gpd.read_file('./croatia_files/raw/site_locations/osijek-fishnet-clip-4326.shp')
print(grid.shape)
print(grid.crs)
grid.head()

(525, 2)
EPSG:4326


Unnamed: 0,Id,geometry
0,0,POINT (18.62279 45.51391)
1,0,POINT (18.62791 45.51381)
2,0,POINT (18.63303 45.51372)
3,0,POINT (18.63814 45.51362)
4,0,POINT (18.64326 45.51352)


In [8]:
points_lst = obtain_places_from_points(url, headers, grid, 200)

obtaining places for point  0  /  525
posting request
obtained response
[]
obtaining places for point  1  /  525
posting request
obtained response
[]
obtaining places for point  2  /  525
posting request
obtained response
[]
obtaining places for point  3  /  525
posting request
obtained response
[]
obtaining places for point  4  /  525
posting request
obtained response
[]
obtaining places for point  5  /  525
posting request
obtained response
[]
obtaining places for point  6  /  525
posting request
obtained response
[]
obtaining places for point  7  /  525
posting request
obtained response
[]
obtaining places for point  8  /  525
posting request
obtained response
[]
obtaining places for point  9  /  525
posting request
obtained response
[]
obtaining places for point  10  /  525
posting request
obtained response
[]
obtaining places for point  11  /  525
posting request
obtained response
[]
obtaining places for point  12  /  525
posting request
obtained response
[]
obtaining places for p

In [9]:
# ensure no points which reached the 20-place cutoff from Google
idxs_of_problem_spots = [i for i, pt in enumerate(points_lst) if len(list(pt.values())[0]) > 19]
idxs_of_problem_spots

[]

In [10]:
city_gdf = clean_maps_api_output(points_lst)

city_gdf.to_file("/mnt/c/Users/natra/Documents/Research/Identity-Conflict/croatia_files/clean/osijek_locs_400x400.geojson",
                          driver='GeoJSON')

Number of place dictionaries returned from API:  70
Shape of resulting dataframe:  (184, 13)
Shape of dataframe after dropping dups:  (184, 13)


### Rijeka

In [11]:
# created grid in as 400 x 400 w x h grids with Fishnet tool in ArcGIS Pro
grid = gpd.read_file('./croatia_files/raw/site_locations/rijeka-fishnet-clip-4326.shp')
print(grid.shape)
print(grid.crs)
grid.head()

(340, 2)
EPSG:4326


Unnamed: 0,Id,geometry
0,0,POINT (14.53001 45.30597)
1,0,POINT (14.53511 45.30606)
2,0,POINT (14.51969 45.3094)
3,0,POINT (14.52479 45.30948)
4,0,POINT (14.52989 45.30957)


In [12]:
points_lst = obtain_places_from_points(url, headers, grid, 200)

obtaining places for point  0  /  340
posting request
obtained response
[]
obtaining places for point  1  /  340
posting request
obtained response
[{'id': 'ChIJw4KMjwOfZEcRrsIEbR4nRt4', 'types': ['cultural_center', 'point_of_interest', 'establishment'], 'formattedAddress': 'Primorje ul. 39, 51222, Bakar, Croatia', 'location': {'latitude': 45.3053059, 'longitude': 14.534917000000002}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Matija Mažić Community Centre', 'languageCode': 'en'}, 'primaryTypeDisplayName': {'text': 'Cultural Center', 'languageCode': 'en-US'}, 'primaryType': 'cultural_center', 'shortFormattedAddress': 'Primorje ul. 39, Bakar'}, {'id': 'ChIJ1agoXaOfZEcRcAesr6p4Vro', 'types': ['church', 'place_of_worship', 'point_of_interest', 'establishment'], 'formattedAddress': '51222, Bakar, Croatia', 'location': {'latitude': 45.3073754, 'longitude': 14.533659900000002}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'St. Andrew’s Church', 'languageCode': 'en'}

In [None]:
# extract the usable output from the above run
idxs_of_problem_spots = [i for i, pt in enumerate(points_lst) if len(list(pt.values())[0]) > 19]
print(idxs_of_problem_spots)
# keep only the results that weren't cut off by the api limit
points_lst_good = [pt for idx, pt in enumerate(points_lst) if idx not in idxs_of_problem_spots]
print(len(points_lst_good))

# re-run the points with > 20 locations using the 'split' method
# first, get geometry representations for each of the points that had > 20 places
problem_points = [points_lst[i] for i in idxs_of_problem_spots]
print(len(problem_points))
# then, get these geometries as a list
prob_point_vals = [list(pt.keys())[0] for pt in problem_points]
print(prob_point_vals)
# finally, filter the overall grid space to identify these problem geometries
pts_to_repeat = grid[grid['geometry'].isin(prob_point_vals)]
pts_to_repeat

[57, 73, 74]
337
3
[<POINT (14.443 45.326)>, <POINT (14.437 45.33)>, <POINT (14.442 45.33)>]


Unnamed: 0,Id,geometry
57,0,POINT (14.44257 45.32603)
73,0,POINT (14.43734 45.32953)
74,0,POINT (14.44244 45.32963)


In [15]:
points_lst_fixed = obtain_places_from_points(url, headers, pts_to_repeat, 200, 'split')

obtaining places for point  57  /  3
Obtaining religious and educational places..
posting request
obtained response
Religious & Edu out_places: {'places': [{'id': 'ChIJbRuc-OGgZEcRE6wanKxOuNQ', 'types': ['university', 'point_of_interest', 'establishment'], 'formattedAddress': 'Trg Riječke rezolucije 4, 51000, Rijeka, Croatia', 'location': {'latitude': 45.327624, 'longitude': 14.441770499999999}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Visoka poslovna škola PAR', 'languageCode': 'en'}, 'primaryTypeDisplayName': {'text': 'College', 'languageCode': 'en-US'}, 'primaryType': 'university', 'shortFormattedAddress': 'Trg Riječke rezolucije 4, Rijeka'}, {'id': 'ChIJGZyGAuKgZEcRs3UGZAWqAeM', 'types': ['tourist_attraction', 'church', 'place_of_worship', 'point_of_interest', 'establishment'], 'formattedAddress': 'Trg Riječke rezolucije, 51000, Rijeka, Croatia', 'location': {'latitude': 45.3273295, 'longitude': 14.4421117}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 

In [16]:
# still one remaining point that had > 20 with the split of types, so will save the others that had < 20..
points_lst_fixed_good = [pt for idx, pt in enumerate(points_lst_fixed) if idx in [1,2]]
print(len(points_lst_fixed_good))

# .. and re-run the points which still have > 20 locations using the 'split_all' method
# first, get geometry representations for each of the points tha had > 20 places
problem_points = [points_lst_fixed[i] for i in [0]]
print(problem_points)
print(len(problem_points))
# then, get these geometries as a list
prob_point_vals = [list(pt.keys())[0] for pt in problem_points]
print(prob_point_vals)
# finally, filter the overall grid space to identify these problem geometries
pts_to_repeat = grid[grid['geometry'].isin(prob_point_vals)]
pts_to_repeat

2
[{<POINT (14.443 45.326)>: [{'id': 'ChIJbRuc-OGgZEcRE6wanKxOuNQ', 'types': ['university', 'point_of_interest', 'establishment'], 'formattedAddress': 'Trg Riječke rezolucije 4, 51000, Rijeka, Croatia', 'location': {'latitude': 45.327624, 'longitude': 14.441770499999999}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Visoka poslovna škola PAR', 'languageCode': 'en'}, 'primaryTypeDisplayName': {'text': 'College', 'languageCode': 'en-US'}, 'primaryType': 'university', 'shortFormattedAddress': 'Trg Riječke rezolucije 4, Rijeka'}, {'id': 'ChIJGZyGAuKgZEcRs3UGZAWqAeM', 'types': ['tourist_attraction', 'church', 'place_of_worship', 'point_of_interest', 'establishment'], 'formattedAddress': 'Trg Riječke rezolucije, 51000, Rijeka, Croatia', 'location': {'latitude': 45.3273295, 'longitude': 14.4421117}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Church of St. Jerome', 'languageCode': 'en'}, 'primaryTypeDisplayName': {'text': 'Church', 'languageCode': 'en-US'}, 'primary

Unnamed: 0,Id,geometry
57,0,POINT (14.44257 45.32603)


In [17]:
# this worked, didn't reach the 20 limit when split by each type
points_lst_fixed_fin = obtain_places_from_points(url, headers, pts_to_repeat, 200, 'split_all')

obtaining places for point  57  /  1
Obtaining places type by type
Obtaining for type  art_gallery
posting request
obtained response
Number of results returned from type art_gallery: 6
Obtaining for type  museum
posting request
obtained response
Number of results returned from type museum: 1
Obtaining for type  performing_arts_theater
posting request
obtained response
Number of results returned from type performing_arts_theater: 1
Obtaining for type  library
posting request
obtained response
Number of results returned from type library: 1
Obtaining for type  school
posting request
obtained response
Number of results returned from type school: 6
Obtaining for type  university
posting request
obtained response
Number of results returned from type university: 2
Obtaining for type  community_center
posting request
obtained response
Obtaining for type  cultural_center
posting request
obtained response
Obtaining for type  historical_landmark
posting request
obtained response
Number of result

In [18]:
points_lst_fin = points_lst_fixed_fin + points_lst_fixed_good + points_lst_good
print(len(points_lst_fin))

city_gdf = clean_maps_api_output(points_lst_fin)

city_gdf.to_file("/mnt/c/Users/natra/Documents/Research/Identity-Conflict/croatia_files/clean/rijeka_locs_400x400.geojson",
                          driver='GeoJSON')

340
Number of place dictionaries returned from API:  114
Shape of resulting dataframe:  (361, 13)
Shape of dataframe after dropping dups:  (359, 13)


## Serbia

### Novi Pazar

In [7]:
# created grid in as 400 x 400 w x h grids with Fishnet tool in ArcGIS Pro
grid = gpd.read_file('./serbia_files/raw/site_locations/novi-pazar-fishnet-clip-4326.shp')
print(grid.shape)
print(grid.crs)
grid.head()

(115, 2)
EPSG:4326


Unnamed: 0,Id,geometry
0,0,POINT (20.49476 43.1168)
1,0,POINT (20.49968 43.11682)
2,0,POINT (20.48981 43.12038)
3,0,POINT (20.49473 43.1204)
4,0,POINT (20.49965 43.12042)


In [10]:
points_lst = obtain_places_from_points(url, headers, grid, 200)

obtaining places for point  0  /  115
posting request
obtained response
[]
obtaining places for point  1  /  115
posting request
obtained response
[]
obtaining places for point  2  /  115
posting request
obtained response
[]
obtaining places for point  3  /  115
posting request
obtained response
[{'id': 'ChIJx2HJPgAtUxMRkq1IosYopR0', 'types': ['mosque', 'place_of_worship', 'point_of_interest', 'establishment'], 'formattedAddress': 'Mur 48, Paralovo, Serbia', 'location': {'latitude': 43.119771199999995, 'longitude': 20.4944107}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Nur Džamija', 'languageCode': 'bs'}, 'primaryTypeDisplayName': {'text': 'Mosque', 'languageCode': 'en-US'}, 'primaryType': 'mosque', 'shortFormattedAddress': 'Mur 48, Paralovo'}]
obtaining places for point  4  /  115
posting request
obtained response
[]
obtaining places for point  5  /  115
posting request
obtained response
[{'id': 'ChIJl_cQH0wtUxMRMv__Z9sLY3I', 'types': ['mosque', 'place_of_worship', 'po

In [11]:
# ensure no points which reached the 20-place cutoff from Google
idxs_of_problem_spots = [i for i, pt in enumerate(points_lst) if len(list(pt.values())[0]) > 19]
idxs_of_problem_spots

[]

In [12]:
city_gdf = clean_maps_api_output(points_lst)

city_gdf.to_file("/mnt/c/Users/natra/Documents/Research/Identity-Conflict/serbia_files/clean/novi_pazar_locs_400x400.geojson",
                          driver='GeoJSON')

Number of place dictionaries returned from API:  43
Shape of resulting dataframe:  (117, 13)
Shape of dataframe after dropping dups:  (117, 13)


### Novi Sad

In [20]:
# created grid in as 400 x 400 w x h grids with Fishnet tool in ArcGIS Pro
grid = gpd.read_file('./serbia_files/raw/site_locations/novi-sad-fishnet-clip-4326.shp')
print(grid.shape)
print(grid.crs)
grid.head()

(546, 2)
EPSG:4326


Unnamed: 0,Id,geometry
0,0,POINT (19.80307 45.22342)
1,0,POINT (19.80816 45.22347)
2,0,POINT (19.81326 45.22353)
3,0,POINT (19.81835 45.22358)
4,0,POINT (19.78262 45.22681)


In [21]:
points_lst = obtain_places_from_points(url, headers, grid, 200)

obtaining places for point  0  /  546
posting request
obtained response
[]
obtaining places for point  1  /  546
posting request
obtained response
[]
obtaining places for point  2  /  546
posting request
obtained response
[]
obtaining places for point  3  /  546
posting request
obtained response
[]
obtaining places for point  4  /  546
posting request
obtained response
[{'id': 'ChIJGXBCTzAOW0cRNuxPFj_65bE', 'types': ['school', 'point_of_interest', 'establishment'], 'formattedAddress': 'Kamenjar 3 br. 7, Novi Sad 21000, Serbia', 'location': {'latitude': 45.2280387, 'longitude': 19.7836914}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Privatna osnovna škola "Kirilo i Metodije" Novi Sad', 'languageCode': 'en'}, 'primaryTypeDisplayName': {'text': 'School', 'languageCode': 'en-US'}, 'primaryType': 'school', 'shortFormattedAddress': 'Kamenjar 3 br. 7, Novi Sad'}]
obtaining places for point  5  /  546
posting request
obtained response
[]
obtaining places for point  6  /  546
pos

In [23]:
# ensure no points which reached the 20-place cutoff from Google
idxs_of_problem_spots = [i for i, pt in enumerate(points_lst) if len(list(pt.values())[0]) > 19]
idxs_of_problem_spots

# keep only the results that weren't cut off by the api limit
points_lst_good = [pt for idx, pt in enumerate(points_lst) if idx not in idxs_of_problem_spots]
print(len(points_lst_good))

# re-run the points with > 20 locations using the 'split' method
# first, get geometry representations for each of the points that had > 20 places
problem_points = [points_lst[i] for i in idxs_of_problem_spots]
print(len(problem_points))
# then, get these geometries as a list
prob_point_vals = [list(pt.keys())[0] for pt in problem_points]
print(prob_point_vals)
# finally, filter the overall grid space to identify these problem geometries
pts_to_repeat = grid[grid['geometry'].isin(prob_point_vals)]
pts_to_repeat


542
4
[<POINT (19.843 45.253)>, <POINT (19.848 45.253)>, <POINT (19.843 45.256)>, <POINT (19.848 45.256)>]


Unnamed: 0,Id,geometry
140,0,POINT (19.84324 45.25264)
141,0,POINT (19.84833 45.25269)
162,0,POINT (19.84316 45.25624)
163,0,POINT (19.84826 45.25629)


In [None]:
# all works, good to go
points_lst_fixed = obtain_places_from_points(url, headers, pts_to_repeat, 200, 'split')

obtaining places for point  140  /  4
Obtaining religious and educational places..
posting request
obtained response
Religious & Edu out_places: {'places': [{'id': 'ChIJUTlceWkQW0cRBs_grCGE_eY', 'types': ['university', 'point_of_interest', 'establishment'], 'formattedAddress': 'Bulevar Mihajla Pupina 4a, Novi Sad, Serbia', 'location': {'latitude': 45.253130899999995, 'longitude': 19.8440284}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Singidunum university', 'languageCode': 'en'}, 'primaryTypeDisplayName': {'text': 'University', 'languageCode': 'en-US'}, 'primaryType': 'university', 'shortFormattedAddress': 'Bulevar Mihajla Pupina 4a, Novi Sad'}, {'id': 'ChIJYZTN0msQW0cRcdx-f6eW6-8', 'types': ['secondary_school', 'school', 'point_of_interest', 'establishment'], 'formattedAddress': 'Narodnih heroja 7, Novi Sad 21000, Serbia', 'location': {'latitude': 45.2524278, 'longitude': 19.843636999999998}, 'businessStatus': 'OPERATIONAL', 'displayName': {'text': 'Srednja škola "Svet

In [None]:
points_lst_fin = points_lst_fixed + points_lst_good
print(len(points_lst_fin))

city_gdf = clean_maps_api_output(points_lst_fin)

city_gdf.to_file("/mnt/c/Users/natra/Documents/Research/Identity-Conflict/serbia_files/clean/novi_sad_locs_400x400.geojson",
                          driver='GeoJSON')