## Geoapify Feature Extraction

The purpose of this data retrieval is to add additional information about each restaurant particularly the cuisine type.
Additionally, if I am able to get all the restaurants and cafes then I will be able to extract out popular chains and see if there are relations relative to those store locations.
Starbucks places their stores in prime locations where customer traffic is high and customers are more affluent.

** Extra Features to add via API
1.  geoapify cuisine type
1.  *geoapify number of starbucks within 0.5 mile radius
1.  *geoapify related business within 0.5 mile radius  
1.  *us census track info of income  

\* indcates that I will work on this later

## Setup

In [2]:
# Import required libraries

# # Code formatter
# # !pip3 install nb_black
# %load_ext nb_black

# eda tools
import pandas as pd
pd.options.display.max_columns = 100

# api tools
import requests
import json

# Import the API key
from config import geoapify_key

# hide jupyter lab warnings
import warnings
warnings.filterwarnings('ignore')


# make sound when this code executes: Audio(sound_file, autoplay=True)
from IPython.display import Audio
sound_file = './sound/chord.wav'

# display package informatin
# !conda install -c conda-forge session-info
import session_info
session_info.show()

### Read Dataset

In [4]:
# Read data
restaurant_df = pd.read_csv('./data/manipulated/combined_data.csv')
grid_layout = pd.read_csv('./data/manipulated/chicago_point_grid.csv')

# View key file
grid_layout.head()

Unnamed: 0,lat,lon,geometry
0,-87.640611,41.661182,POINT (-87.64061065791103 41.66118220092816)
1,-87.607332,41.661182,POINT (-87.60733249962075 41.66118220092816)
2,-87.590693,41.661182,POINT (-87.59069342047562 41.66118220092816)
3,-87.574054,41.661182,POINT (-87.57405434133048 41.66118220092816)
4,-87.557415,41.661182,POINT (-87.55741526218534 41.66118220092816)


### Places API

In [5]:
# Geoapify - 3000 queries per day
# Places API

# Set to True to use API
query_data = False

if query_data:

    all_data = []
    for i in grid_layout.iterrows():

        # query parameters
        long,lat = i[1].lat,i[1].lon
        categories = "catering.restaurant"
        radius = 1200
        filters = f"circle:{long},{lat},{radius}"
        bias = f"proximity:{long},{lat}"
        limit = 200

        REQUEST_PARAMS = {
            "categories":categories,
            "limit":limit,
            "filter":filters,
            "bias":bias,
            "apiKey":geoapify_key    
        }

        # get business info - should I use places-details ??
        places_url = f'https://api.geoapify.com/v2/places'

        geo_data = requests.get(places_url, params=REQUEST_PARAMS).json()

        # Print the json (pretty printed)
        # print(json.dumps(geo_data, indent=4, sort_keys=True))
        all_data.append(geo_data)

    # play sound when done
    Audio(sound_file, autoplay=True)

    # save list of dictionaries to file
    with open('./data/manipulated/geo_data.json', 'w') as f:
        json.dump(all_data, f, indent=4)

In [6]:
# load data from previous api query
with open('./data/manipulated/geo_data.json', 'r') as f:
    all_data = json.load(f)

In [22]:
# view one record
all_data[4]

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'properties': {'name': 'Chicago Pita Kitchen',
    'country': 'United States',
    'country_code': 'us',
    'state': 'Illinois',
    'county': 'Cook County',
    'city': 'Chicago',
    'municipality': 'Hyde Park Township',
    'postcode': '60633',
    'suburb': 'Hegewisch',
    'street': 'South Brainard Avenue',
    'housenumber': '13227',
    'lon': -87.55396332945435,
    'lat': 41.6550718,
    'state_code': 'IL',
    'formatted': 'Chicago Pita Kitchen, 13227 South Brainard Avenue, Chicago, IL 60633, United States of America',
    'address_line1': 'Chicago Pita Kitchen',
    'address_line2': '13227 South Brainard Avenue, Chicago, IL 60633, United States of America',
    'categories': ['building',
     'building.catering',
     'catering',
     'catering.restaurant',
     'catering.restaurant.greek'],
    'details': ['details',
     'details.building',
     'details.catering',
     'details.contact'],
    'datasource'

In [7]:
# capture only the important categories
restaurants_only = []
for region in all_data:
    for i in region['features']:
        restaurants_only.append(i['properties'])

In [8]:
# Chicago is suppose to have around 7,300 restaurants
len(restaurants_only)

3701

In [9]:
# view one record
restaurants_only[1]

{'name': "Doreen's Pizzeria",
 'country': 'United States',
 'country_code': 'us',
 'state': 'Illinois',
 'county': 'Cook County',
 'city': 'Chicago',
 'municipality': 'Hyde Park Township',
 'postcode': '60633',
 'suburb': 'Hegewisch',
 'street': 'South Baltimore Avenue',
 'housenumber': '13201',
 'lon': -87.54682686006686,
 'lat': 41.655375250000006,
 'state_code': 'IL',
 'formatted': "Doreen's Pizzeria, 13201 South Baltimore Avenue, Chicago, IL 60633, United States of America",
 'address_line1': "Doreen's Pizzeria",
 'address_line2': '13201 South Baltimore Avenue, Chicago, IL 60633, United States of America',
 'categories': ['building',
  'building.catering',
  'catering',
  'catering.restaurant',
  'catering.restaurant.pizza'],
 'details': ['details', 'details.building', 'details.catering'],
 'datasource': {'sourcename': 'openstreetmap',
  'attribution': '© OpenStreetMap contributors',
  'license': 'Open Database Licence',
  'url': 'https://www.openstreetmap.org/copyright',
  'raw': 

In [17]:
# extract data from json
# I could do this in simpler ways - extract out only a section and convert to dataframe would be the easiest
# I could also use the keys as the column names and loop through the json 

# create a list of dictionaries with error handling
primary_data = []
for record in restaurants_only:
    try:
        a1 = record['name']
    except:
        a1 = None
    try:
        a2 = record['municipality']
    except:
        a2 = None
    try: 
        a3 = record['postcode']
    except:
        a3 = None
    try:
        a4 = record['suburb']
    except:
        a4 = None
    try:
        a5 = record['street']
    except:
        a5 = None
    try:
        a6 = record['housenumber']
    except:
        a6 = None
    try:
        a7 = record['categories']  
    except:
        a7 = None
    try:
        a8 = record['datasource']['raw']['amenity']
    except:
        a8 = None
    try:
        a9 = record['datasource']['raw']['cuisine']
    except:
        a9 = None
    try:
        a10 = record['datasource']['raw']['website']
    except:
        a10 = None
    try:
        a11 = record['datasource']['raw']['building']
    except:
        a11 = None
    try:
        a12 = record['datasource']['raw']['building:levels']
    except:
        a12 = None
    try:
        a13 = record['distance']
    except:
        a13 = None

    a14 = record['place_id']
    restaurant_dict = {'name':a1, 'municipality':a2, 'zipcode':a3, 'neighborhood':a4, 'street':a5, 'housenumber':a6, 'categories':a7, 'business_type':a8, 'cuisine':a9, 'website':a10, 'building':a11, 'building-levels':a12, 'distance':a13, 'place_id':a14}
    primary_data.append(restaurant_dict)

# create dataframe
df = pd.DataFrame(primary_data)

In [18]:
# view data
df.head()

Unnamed: 0,name,municipality,zipcode,neighborhood,street,housenumber,categories,business_type,cuisine,website,building,building-levels,distance,place_id
0,Chicago Pita Kitchen,Hyde Park Township,60633,Hegewisch,South Brainard Avenue,13227,"[building, building.catering, catering, cateri...",restaurant,greek,http://www.chicagopita.com/,yes,2.0,731,51bf76b6f073e355c05914a1d2cad9d34440f00102f901...
1,Doreen's Pizzeria,Hyde Park Township,60633,Hegewisch,South Baltimore Avenue,13201,"[building, building.catering, catering, cateri...",restaurant,pizza,https://www.doreenspizzeria.com/,yes,2.0,1079,51f4706e32ffe255c059279e528ae3d34440f00102f901...
2,Doreen's Pizzeria,Hyde Park Township,60633,Hegewisch,South Baltimore Avenue,13201,"[building, building.catering, catering, cateri...",restaurant,pizza,https://www.doreenspizzeria.com/,yes,2.0,806,51f4706e32ffe255c059279e528ae3d34440f00102f901...
3,Taquaria el Taquin,Hyde Park Township,60633,Hegewisch,South Brandon Avenue,13307,"[building, building.catering, building.commerc...",restaurant,mexican,,retail,2.0,945,514b9fecf2ebe255c059f541ee0ea3d34440f00102f901...
4,China Garden,Hyde Park Township,60633,Hegewisch,South Baltimore Avenue,13328,"[building, building.catering, building.commerc...",restaurant,chinese,https://www.chinagardenil.net/,retail,1.0,1069,51398165a50ae355c059060ae63191d34440f00102f901...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3696,Taste of Peru,Rogers Park Township,60626,Rogers Park,North Clark Street,6545,"[catering, catering.restaurant, catering.resta...",restaurant,peruvian,https://tasteofperu.com/,,,1046,51b23f9afbf5ea55c0596157bf7e2d004540f00103f901...
3697,Peckish Pig,Evanston Township,60202,Rogers Park,West Howard Street,623,"[catering, catering.restaurant]",restaurant,pub,,,,1098,51438e08217beb55c059c2d1d03582024540f00103f901...
3698,Giordano's Pizzeria,Rogers Park Township,60626,Rogers Park,North Sheridan Road,6836,"[building, building.catering, building.commerc...",restaurant,pizza,https://giordanos.com/locations/rogers-park/,retail,,1104,51e7bb090057ea55c059ff7a2bccd7004540f00102f901...
3699,TJam Kitchen,Rogers Park Township,60626,Rogers Park,West Howard Street,1418,"[catering, catering.restaurant]",restaurant,,,,,1131,51d6500f76afea55c05928b559eb7a024540f00103f901...


In [20]:
# review missing information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3701 entries, 0 to 3700
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   name             3661 non-null   object 
 1   municipality     2658 non-null   object 
 2   zipcode          3701 non-null   object 
 3   neighborhood     3404 non-null   object 
 4   street           3701 non-null   object 
 5   housenumber      3595 non-null   object 
 6   categories       3701 non-null   object 
 7   business_type    3701 non-null   object 
 8   cuisine          2522 non-null   object 
 9   website          2087 non-null   object 
 10  building         530 non-null    object 
 11  building-levels  344 non-null    float64
 12  distance         3701 non-null   int64  
 13  place_id         3701 non-null   object 
dtypes: float64(1), int64(1), object(12)
memory usage: 404.9+ KB


#### Review  
The buidling and buiding-level features are probably un-useable due to missing values.  
I wish cuisine was a bit higher - maybe I can fill in values based on information found in categories.  
More information may be available by using the places details api - from that api, the wikepedia id can also be obtained, as well as the social media.  

### Details API

In [25]:
# Geoapify - 3000 queries per day
# Place-details API

# Set to True to use API
query_data = False

if query_data:

    all_data_details = []
    for i in grid_layout.iterrows():

        # query parameters
        long,lat = i[1].lat,i[1].lon
        features = f"radius_500.restaurant"
        lang = f"en"

        REQUEST_PARAMS = {
            "lat":lat,
            "lon":long,
            "features":features,
            "lang":lang,
            "apiKey":geoapify_key    
        }

        # get additional details
        place_details_url = f'https://api.geoapify.com/v2/place-details'

        geo_data = requests.get(place_details_url, params=REQUEST_PARAMS).json()

        # Print the json (pretty printed)
        # print(json.dumps(geo_data, indent=4, sort_keys=True))
        all_data_details.append(geo_data)


    Audio(sound_file, autoplay=True)

    # save list of dictionaries to file
    with open('./data/manipulated/geo_data_details.json', 'w') as f:
        json.dump(all_data_details, f, indent=4)

In [None]:
# load data from previous api query
with open('./data/manipulated/geo_data_details.json', 'r') as f:
    all_data_details = json.load(f)

In [27]:
# review one retrieved record
all_data_details[4]

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'properties': {'feature_type': 'radius_500',
    'type': 'radius',
    'range': 500,
    'lat': 41.66118220092816,
    'lon': -87.55741526218534,
    'area': 785893},
   'geometry': {'type': 'Polygon',
    'coordinates': [[[-87.55741526218534, 41.66567880274678],
      [-87.55800525194248, 41.665657148871524],
      [-87.55858955859478, 41.66559239582785],
      [-87.55916255381423, 41.665485167351655],
      [-87.55971871829712, 41.66533649632076],
      [-87.56025269495825, 41.665147814799646],
      [-87.56075934055774, 41.66492094023718],
      [-87.5612337752623, 41.66465805795035],
      [-87.56167142966206, 41.66436170006343],
      [-87.56206808878979, 41.66403472110598],
      [-87.56241993271777, 41.663680270505196],
      [-87.56272357334134, 41.66330176223823],
      [-87.5629760869948, 41.66290284193744],
      [-87.56317504258654, 41.662487351765954],
      [-87.5633185249828, 41.662059293402336],
      [-

#### Review
The details api did not really provide extra information due to many of the interesting fields having missing values.  
I will return to this later when I have more time.  
I think I will need to modify the original search critera so instead of searching very 0.5 mile radius on a grid, I will need to search every 0.25 mile radius.  
I will need to plot the restaurant locations that were found by the api to see if there are any patterns.   