In [149]:
# imports

import requests
import os
import pandas as pd


# Foursquare

## Julie's Notes:
From Foursquare categories list (https://location.foursquare.com/places/docs/categories), I compiled the following categories (sometimes combined) which I will send with the "categories" query parameter for each bike station location:

<img src='../images/foursquare_categories.png'>

These decisions attempt to optimize the tension between API call limit (40,000) and response item limit (maxxes out at 50 items).

Send a request to Foursquare with a small radius (1000m) for all the bike stations in your city of choice. 

In [150]:
# Pull my FOURSQUARE API key into a variable
FOURSQUARE_KEY = os.getenv('FOURSQUARE_API_KEY')

In [151]:
# Function Definitions

# Define function that will make the GET request to Foursquare
#def yelp_get_request_business_search(station_latitude, station_longitude, radius, categories, API_KEY):
def foursquare_get_request_place_search(station_latitude, station_longitude, radius, categories, API_KEY):

    base_foursquare_endpoint = 'https://api.foursquare.com/v3'
    place_search = '/places/search'

    # Default Query Parameters for all our GET requests, that aren't otherwise passed in
    limit = 50  # Always get as many as allowed
    sort_by = 'distance'  # Foursquare says 'ratings' is a valid sort option, but I haven't seen evidence in their payload that they have ratings!

    # Craft the request_url:
    request_url = base_foursquare_endpoint + place_search + \
    '?' + \
    'll=' + str(station_latitude) + ',' + str(station_longitude) + \
    '&radius=' + str(radius) + \
    '&categories=' + categories + \
    '&limit=' + str(limit) + \
    '&sort_by=' + sort_by

    header_dict = {
    'accept': 'application/json',
    'Authorization' : FOURSQUARE_KEY
    }
    
    # DEBUG
    #print(f"Inside foursquare_get_request: request_url = {request_url}, header_dict = {header_dict}")
    print(f"     Calling API: request_url = {request_url}")

    # Make the call, get response out
    response = requests.get(request_url, headers=header_dict)

    # Return the payload_dict to caller
    return response.json()
    

# Define function that will create the default fsq_dict for each GET request:
def create_default_fsqdict(fsqdict):
    fsqdict.clear()
    fsqdict['station_id']= []
    fsqdict['place_id'] = []
    fsqdict['name'] = []
    fsqdict['distance'] = []
    fsqdict['address'] = []
    fsqdict['city'] = []
    fsqdict['postal'] = []
    fsqdict['category_id'] = []
    fsqdict['category_name'] = []
    fsqdict['query_categories'] = []
    fsqdict['query_category_text'] = []


# Define function that will parse the JSON-formatted response
def fsqdict_from_response(stationid, fsqdict, jsonpayload, query_categories, query_category_text):

    #print(f"I'm in fsqdict_from_response, and fsqdict is:\n{fsqdict}")
    # Ternary operators are required because my testing indicated some entries are missing 'postcode'.
    # Given the risk of this erroring-out on 425 * 6 = 2550 calls, adding ternary operators to check for presence, on name/distance/address/city/postcode
    results_array = jsonpayload['results']
    for result in results_array:
        fsqdict['station_id'].append(stationid)
        fsqdict['place_id'].append(result['fsq_id'])
        fsqdict['name'].append(result['name'] if 'name' in result else 'N/A')
        fsqdict['distance'].append(result['distance'] if 'distance' in result else NaN)
        fsqdict['address'].append(result['location']['address'] if 'address' in result['location'] else 'N/A')
        fsqdict['city'].append(result['location']['locality'] if 'locality' in result['location'] else 'N/A')
        fsqdict['postal'].append(result['location']['postcode'] if 'postcode' in result['location'] else 'N/A')
        
        categories_array = result['categories']
        id_string = ''
        categoryname_string = ''
        for entry in categories_array:
            id_string += str(entry['id']) + '|'
            categoryname_string += entry['name'] + '|'
        
        id_string = id_string[:-1]
        categoryname_string = categoryname_string[:-1]
        fsqdict['category_id'].append(id_string)
        fsqdict['category_name'].append(categoryname_string)
        fsqdict['query_categories'].append(query_categories)
        fsqdict['query_category_text'].append(query_category_text)

    return fsqdict

In [121]:
# # Test with static JSON file:  Delete me after
# import json
# json_file = open('C:/Users/raref/Lighthouse/W05D02_Data_Wrangling_Challenge_Walkthrough/Other_data_types_exercise/payload_postman_cambie_fsq_coffee.json', 'r')
# payload_dict = json.load(json_file)
# payload_dict

{'results': [{'fsq_id': '4b1db335f964a520211424e3',
   'categories': [{'id': 13034,
     'name': 'Café',
     'short_name': 'Café',
     'plural_name': 'Cafés',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/cafe_',
      'suffix': '.png'}},
    {'id': 13263,
     'name': 'Japanese Restaurant',
     'short_name': 'Japanese',
     'plural_name': 'Japanese Restaurants',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/japanese_',
      'suffix': '.png'}}],
   'chains': [],
   'closed_bucket': 'VeryLikelyOpen',
   'distance': 96,
   'geocodes': {'main': {'latitude': 49.263328, 'longitude': -123.11402},
    'roof': {'latitude': 49.263328, 'longitude': -123.11402}},
   'link': '/v3/places/4b1db335f964a520211424e3',
   'location': {'address': '451W Broadway W',
    'country': 'CA',
    'cross_street': 'at Cambie St',
    'formatted_address': '451W Broadway W (at Cambie St), Vancouver BC V5Y 1R4',
    'locality': 'Vancouver',
    'postcode': 'V5Y 1R4'

In [153]:
# Categories - 6 API calls for each station_id (each of these strings for 'categories=' in the query parameter)
# Unlike Yelp, this needs a dictionary because the category_ids are numerical and hard for humans to identify easily when looking at the eventual dataframe
categories_dict = {
    '10027,10047,10059,10069' : 'Arts and Entertainment: Museum|Public Art',
    '13032' : 'Dining and Drinking: Cafe, Coffee and Tea House',
    '16003,16020,16046' : 'Landmarks and Outdoors: Beach|Historic Site|Scenic Lookout',
    '16004' : 'Bike Trail',
    '16032' : 'Park',
    '19010,19013,19014,19019' : 'Travel and Transportation: B&B|Hostel|Hotel|Vacation Rental'
}

# Set Default Radius
radius = 1000

# Load the citybikes dataframe
stations_df = pd.read_csv('../data/citybikes_vancouver.csv')

# Generate the list of station_ids
station_ids_list = stations_df['id'].tolist()
station_ids_list = sorted(station_ids_list)

# Set up an empty dataframe 'rolling_df' which will collect each individual API call into the larger dataframe
rolling_df = pd.DataFrame()

#for station_id in station_ids_list[:1]:  # Testing only
for station_id in station_ids_list:
    print(f"************** New Station! **************")
    for entry_category, category_text in categories_dict.items():
        filt_station = (stations_df['id'] == station_id)
        station_lat = stations_df.loc[filt_station]['lat'].values[0]
        station_long = stations_df.loc[filt_station]['long'].values[0]
        print (f"Working on station_id: {station_id}, categories being sent is: {entry_category} ({category_text})")
        print (f"     station_lat, station_long = ({station_lat}, {station_long})")

        # Call the function to do Foursquare GET request from the API
        payload_dict = foursquare_get_request_place_search(station_lat, station_long, radius, entry_category, FOURSQUARE_KEY)
        
        # Define/reset the fsqdict dictionary, which holds the parsed JSON from the REST GET API call
        fsqdict = dict()
        create_default_fsqdict(fsqdict)

        # Parse the JSON from the payload from the API call
        fsqdict_from_response(station_id, fsqdict, payload_dict, entry_category, category_text)
        
        # Create a temp_df dataframe
        temp_df = pd.DataFrame(fsqdict)

        # Debug
        print(f"          number rows = {temp_df.shape[0]}")

        # Add the just-generated dataframe from this singular previous API call, to the cumulative "rolling_df" dataframe for the entirety of the station_ids
        rolling_df = pd.concat([rolling_df, temp_df], ignore_index=True)

************** New Station! **************
Working on station_id: 00fa94ad698dc4a9e4d708d6fd32f294, categories being sent is: 10027,10047,10059,10069 (Arts and Entertainment: Museum|Public Art)
     station_lat, station_long = (49.291909, -123.140713)
     Calling API: request_url = https://api.foursquare.com/v3/places/search?ll=49.291909,-123.140713&radius=1000&categories=10027,10047,10059,10069&limit=50&sort_by=distance
          number rows = 3
Working on station_id: 00fa94ad698dc4a9e4d708d6fd32f294, categories being sent is: 13032 (Dining and Drinking: Cafe, Coffee and Tea House)
     station_lat, station_long = (49.291909, -123.140713)
     Calling API: request_url = https://api.foursquare.com/v3/places/search?ll=49.291909,-123.140713&radius=1000&categories=13032&limit=50&sort_by=distance
          number rows = 32
Working on station_id: 00fa94ad698dc4a9e4d708d6fd32f294, categories being sent is: 16003,16020,16046 (Landmarks and Outdoors: Beach|Historic Site|Scenic Lookout)
     s

Parse through the response to get the POI (such as restaurants, bars, etc) details you want (ratings, name, location, etc)

In [154]:
rolling_df.shape

(18007, 11)

In [155]:
rolling_df.head(50)

Unnamed: 0,station_id,place_id,name,distance,address,city,postal,category_id,category_name,query_categories,query_category_text
0,00fa94ad698dc4a9e4d708d6fd32f294,4b5b84c0f964a5202f0429e3,The Inukshuk,869.0,1700 Beach Ave,Vancouver,,10047|13065|16016|19014,Public Art|Restaurant|Fountain|Hotel,10027100471005910069,Arts and Entertainment: Museum|Public Art
1,00fa94ad698dc4a9e4d708d6fd32f294,4ded40b4fa76b21ed97ff4db,Lord Stanley of Preston statue,747.0,900 Stanley Park Dr,Vancouver,V6G 3E2,10047|16026,Public Art|Monument,10027100471005910069,Arts and Entertainment: Museum|Public Art
2,00fa94ad698dc4a9e4d708d6fd32f294,4e5955281f6e804280c4bad1,Roedde House Museum,833.0,1415 Barclay St,Vancouver,V6G 1J6,10030,History Museum,10027100471005910069,Arts and Entertainment: Museum|Public Art
3,00fa94ad698dc4a9e4d708d6fd32f294,4aa9ac4ff964a520cc5420e3,Cardero Bottega,647.0,1016 Cardero St,Vancouver,V6G 2H1,13035|13039|13145,Coffee Shop|Deli|Fast Food Restaurant,13032,"Dining and Drinking: Cafe, Coffee and Tea House"
4,00fa94ad698dc4a9e4d708d6fd32f294,52c72fe5498edf109f363e18,Greenhorn Cafe,760.0,994 Nicola St,Vancouver,V6G 2C8,13034|13035|13065,Café|Coffee Shop|Restaurant,13032,"Dining and Drinking: Cafe, Coffee and Tea House"
5,00fa94ad698dc4a9e4d708d6fd32f294,4ad008e2f964a5205fd720e3,Delany's Coffee House,413.0,1105 Denman St,Vancouver,V6G 2M7,13035|13065,Coffee Shop|Restaurant,13032,"Dining and Drinking: Cafe, Coffee and Tea House"
6,00fa94ad698dc4a9e4d708d6fd32f294,52e19f99498ee190912a3ff0,Pappa Roti,793.0,1505 Robson St,Vancouver,V6G 1C3,13002|13034|13035,Bakery|Café|Coffee Shop,13032,"Dining and Drinking: Cafe, Coffee and Tea House"
7,00fa94ad698dc4a9e4d708d6fd32f294,531f860e498e932bf3c1ecfb,JJ Bean Coffee Roasters,637.0,1209 Bidwell St,Vancouver,V6G 2K7,13035|13065,Coffee Shop|Restaurant,13032,"Dining and Drinking: Cafe, Coffee and Tea House"
8,00fa94ad698dc4a9e4d708d6fd32f294,4acfe023f964a52099d620e3,Starbucks,518.0,1795 Davie St,Vancouver,V6G 1W5,13035,Coffee Shop,13032,"Dining and Drinking: Cafe, Coffee and Tea House"
9,00fa94ad698dc4a9e4d708d6fd32f294,4b197914f964a520f8dd23e3,Red Umbrella Cafe,625.0,1707 Davie St,Vancouver,V6G 1W5,13034|13035|13051,Café|Coffee Shop|Fish and Chips Shop,13032,"Dining and Drinking: Cafe, Coffee and Tea House"


In [156]:
rolling_df.groupby('query_category_text').size()

query_category_text
Arts and Entertainment: Museum|Public Art                      1081
Bike Trail                                                       26
Dining and Drinking: Cafe, Coffee and Tea House                8113
Landmarks and Outdoors: Beach|Historic Site|Scenic Lookout      537
Park                                                           3861
Travel and Transportation: B&B|Hostel|Hotel|Vacation Rental    4389
dtype: int64

Put your parsed results into a DataFrame

In [157]:
# Save the dataframe as .csv file

rolling_df.to_csv('../data/fsq_vancouver.csv', index=False)  # Saved on 2023-10-20 afternoon

# Yelp

## Julie's Notes:

From Yelp categories list (https://docs.developer.yelp.com/docs/resources-categories), I compiled the following categories (sometimes combined) which I will send with the "categories" query parameter for each bike station location:

<img src='../images/yelp_categories.png'>

These decisions attempt to optimize the tension between API call limit (500 per day) and response item limit (maxxes out at 50 items).


Send a request to Yelp with a small radius (1000m) for all the bike stations in your city of choice. 

In [5]:
# Pull my YELP API key into a variable
YELP_KEY = os.getenv('YELP_API_KEY')

In [60]:
# Substitute this until we can get it working
import json
json_file = open('C:/Users/raref/Lighthouse/W05D02_Data_Wrangling_Challenge_Walkthrough/Other_data_types_exercise/payload_postman_cambie_yelp_coffee.json', 'r')
payload_dict = json.load(json_file)
payload_dict

{'businesses': [{'id': '6iOAgzJ0DRZNSKA3FSrrOg',
   'alias': 'la-taqueria-pinche-taco-shop-vancouver',
   'name': 'La Taqueria Pinche Taco Shop',
   'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/fjhIj3XKuQ4mquD4Mg8OoQ/o.jpg',
   'is_closed': False,
   'url': 'https://www.yelp.com/biz/la-taqueria-pinche-taco-shop-vancouver?adjust_creative=5xMXIdPreqyQMiwJq_zdCg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=5xMXIdPreqyQMiwJq_zdCg',
   'review_count': 683,
   'categories': [{'alias': 'mexican', 'title': 'Mexican'}],
   'rating': 4.0,
   'coordinates': {'latitude': 49.263559, 'longitude': -123.112736},
   'transactions': [],
   'price': '$$',
   'location': {'address1': '2450 Yukon Street',
    'address2': '',
    'address3': '',
    'city': 'Vancouver',
    'zip_code': 'V5Z 3V6',
    'country': 'CA',
    'state': 'BC',
    'display_address': ['2450 Yukon Street',
     'Vancouver, BC V5Z 3V6',
     'Canada']},
   'phone': '+16045582549',
   'display_phone': 

In [61]:
# Function Definitions

# Define function that will make the GET request to Yelp
def yelp_get_request_business_search(station_latitude, station_longitude, radius, categories, API_KEY):

    base_yelp_endpoint = 'https://api.yelp.com/v3'
    business_search = '/businesses/search'

    # Default Query Parameters for all our GET requests, that aren't otherwise passed in
    limit = 50  # Always get as many as allowed
    sort_by = 'best_match'

    # Craft the request_url:
    request_url = base_yelp_endpoint + business_search + \
    '?' + \
    'latitude=' + str(station_latitude) + \
    '&longitude=' + str(station_longitude) + \
    '&radius=' + str(radius) + \
    '&categories=' + categories + \
    '&limit=' + str(limit) + \
    '&sort_by=' + sort_by

    header_dict = {
        'accept': 'application/json',
        'Authorization' : 'Bearer ' + API_KEY
        }
    
    # DEBUG
    #print(f"Inside yelp_get_request: request_url = {request_url}, header_dict = {header_dict}")

    # Make the call, get response out
    response = requests.get(request_url, headers=header_dict)

    # Return the payload_dict to caller
    return response.json()
    

# Define function that will create the default yelp_dict for each GET request:
def create_default_yelpdict(yelpdict):
    yelpdict.clear()
    yelpdict['station_id']= []
    yelpdict['place_id'] = []
    yelpdict['name'] = []
    yelpdict['distance'] = []
    yelpdict['address'] = []
    yelpdict['city'] = []
    yelpdict['postal'] = []
    yelpdict['review_count'] = []
    yelpdict['rating'] = []
    yelpdict['category_id'] = []
    yelpdict['category_name'] = []
    yelpdict['query_categories'] = []


# Define function that will parse the JSON-formatted response
def yelpdict_from_response(stationid, yelpdict, jsonpayload, query_categories):

    #print(f"I'm in yelpdict_from_response, and yelpdict is:\n{yelpdict}")
    businesses_array = jsonpayload['businesses']
    for biz in businesses_array:
        yelpdict['station_id'].append(stationid)
        yelpdict['place_id'].append(biz['id'])
        yelpdict['name'].append(biz['name'])
        yelpdict['distance'].append(biz['distance'])
        yelpdict['address'].append(biz['location']['address1'])
        yelpdict['city'].append(biz['location']['city'])
        yelpdict['postal'].append(biz['location']['zip_code'])
        yelpdict['review_count'].append(biz['review_count'])
        yelpdict['rating'].append(biz['rating'])

        categories_array = biz['categories']
        alias_string = ''
        title_string = ''
        for entry in categories_array:
            alias_string += entry['alias'] + '|'
            title_string += entry['title'] + '|'
        
        alias_string = alias_string[:-1]
        title_string = title_string[:-1]
        yelpdict['category_id'].append(alias_string)
        yelpdict['category_name'].append(title_string)
        yelpdict['query_categories'].append(query_categories)

    return yelpdict

In [39]:
mystation_latitude = 49.274566
mystation_longitude = -123.121817
mystation_id = '32603a87cfca71d0f7dfa3513bad69d5'
myradius=1000
mycategories='beaches,parks,bicyclepaths,mountainbiking,museums,hostels,hotels'
myresponse = yelp_get_request_business_search(mystation_latitude, mystation_longitude, myradius, mycategories, YELP_KEY)
print(myresponse)
#yelp_get_request_business_search(station_latitude, station_longitude, radius, categories, API_KEY)

Inside yelp_get_request: request_url = https://api.yelp.com/v3/businesses/search?latitude=49.274566&longitude=-123.121817&radius=1000&categories=beaches,parks,bicyclepaths,mountainbiking,museums,hostels,hotels&limit=50&sort_by=best_match, header_dict = {'accept': 'application/json', 'Authorization': 'Bearer 6xI0usznZmRlsdcnN1DIEq7JF-iQbgkdkbfFbMrM99-4E2DWENHZnL2WmBZ0iUTcLxYksUg1AOMK9Q3FZzlJkDBXoMPTO6ngG_8FvSguqNZ3tBvYVDLT5Z7AHZsxZXYx'}
{'businesses': [{'id': 'DIf1ux1zR8cHp9neCEoyYg', 'alias': 'opus-hotel-vancouver-vancouver-2', 'name': 'OPUS Hotel Vancouver', 'image_url': 'https://s3-media4.fl.yelpcdn.com/bphoto/NY1Y1pQyXN9B49KodFqZHA/o.jpg', 'is_closed': False, 'url': 'https://www.yelp.com/biz/opus-hotel-vancouver-vancouver-2?adjust_creative=5xMXIdPreqyQMiwJq_zdCg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=5xMXIdPreqyQMiwJq_zdCg', 'review_count': 179, 'categories': [{'alias': 'hotels', 'title': 'Hotels'}], 'rating': 4.5, 'coordinates': {'latitude': 49.274667

In [42]:
type(myresponse)
bus_array = myresponse['businesses']
len(bus_array)

50

In [81]:
# Define Yelp query string categories / category names that will be sent in GET requests
# Adjusted - this was too many API calls
# category_mapping = {
#     "beaches,parks": "Beaches, Parks",
#     "bicyclepaths,mountainbiking": "Bicycle Paths, Mountain Biking",
#     "museums": "Museums",
#     "coffee": "Coffee & Tea",
#     "juicebars": "Juice Bars & Smoothies",
#     "hostels,hotels": "Hotels, Hotels"
# }

# Temporary 2nd YELP KEY with jtwleung:  TODO:  Don't forget to delete below
#YELP_KEY = 'wHKvbUKt45iPv0vitouFdnQ5M75dq8bw8WD1RfBpIvE11xMwXATMRbsQ80iL9nKpVJU4JZKbINPdznr9DaMDbeWQ_prpxl3amhNLSVoaWBaOPTiSQen3SEZ0oCEyZXYx'

# Categories - 3 API calls for each station_id (each of these strings for categories= in the query parameter)
categories = ['beaches,parks,bicyclepaths,mountainbiking', 'museums', 'hostels,hotels']

# Set Default Radius
radius = 1000

# Load the citybikes dataframe
stations_df = pd.read_csv('../data/citybikes_vancouver.csv')

# Generate the list of station_ids
station_ids_list = stations_df['id'].tolist()
station_ids_list = sorted(station_ids_list)

# Break into partition sizes to maximize usage of Yelp's 500 daily call limit (resets at 6pm Mountain == midnight UTC) and 3 calls per station_id in 245 stations
partition_1_size = 122
partition_2_size = 36

partition_1 = station_ids_list[:partition_1_size]
partition_2 = station_ids_list[partition_1_size: partition_1_size + partition_2_size]
partition_3 = station_ids_list[partition_1_size + partition_2_size:]  # is 87 ids long

# Set up an empty dataframe 'rolling_df' which will collect each individual API call into the larger dataframe
rolling_df = pd.DataFrame()

#for station_id in station_ids_list:  # Can't use this because Yelp's daily limit will cause this to fail mid-way
#for station_id in partition_1:  # Used this on 2023-10-19
#for station_id in partition_2:  # Used this on 2023-10-19 to hit remainder of calls for the day
for station_id in partition_3:  # Used this on 2023-10-20 to hit remainder of calls for the day
    print(f"************** New Station! **************")
    for entry_category in categories:
        filt_station = (stations_df['id'] == station_id)
        station_lat = stations_df.loc[filt_station]['lat'].values[0]
        station_long = stations_df.loc[filt_station]['long'].values[0]
        print (f"Working on station_id: {station_id}, categories being sent is: {entry_category}")
        print (f"     station_lat, station_long = ({station_lat}, {station_long})")

        # Call the function to do YELP GET request from the API
        #payload_dict = yelp_get_request_business_search(station_lat, station_long, radius, entry_category, YELP_KEY)
        
        # Define/reset the yelpdict dictionary, which holds the parsed JSON from the REST GET API call
        yelpdict = dict()
        create_default_yelpdict(yelpdict)

        # Parse the JSON from the payload from the API call
        yelpdict_from_response(station_id, yelpdict, payload_dict, entry_category)
        
        # Create a temp_df dataframe
        temp_df = pd.DataFrame(yelpdict)

        # Add the just-generated dataframe from this singular previous API call, to the cumulative "rolling_df" dataframe for the entirety of the station_ids
        rolling_df = pd.concat([rolling_df, temp_df], ignore_index=True)

************** New Station! **************
Working on station_id: a74744ce4bb7ea2aa9f406ac8bff95d8, categories being sent is: beaches,parks,bicyclepaths,mountainbiking
     station_lat, station_long = (49.280977, -123.035969)
Inside yelp_get_request: request_url = https://api.yelp.com/v3/businesses/search?latitude=49.280977&longitude=-123.035969&radius=1000&categories=beaches,parks,bicyclepaths,mountainbiking&limit=50&sort_by=best_match, header_dict = {'accept': 'application/json', 'Authorization': 'Bearer wHKvbUKt45iPv0vitouFdnQ5M75dq8bw8WD1RfBpIvE11xMwXATMRbsQ80iL9nKpVJU4JZKbINPdznr9DaMDbeWQ_prpxl3amhNLSVoaWBaOPTiSQen3SEZ0oCEyZXYx'}
Working on station_id: a74744ce4bb7ea2aa9f406ac8bff95d8, categories being sent is: museums
     station_lat, station_long = (49.280977, -123.035969)
Inside yelp_get_request: request_url = https://api.yelp.com/v3/businesses/search?latitude=49.280977&longitude=-123.035969&radius=1000&categories=museums&limit=50&sort_by=best_match, header_dict = {'accept': '

Unnamed: 0,station_id,place_id,name,distance,address,city,postal,review_count,rating,category_id,category_name,query_categories
0,a74744ce4bb7ea2aa9f406ac8bff95d8,EByyWFFnSnrmQZct_j6MBg,New Brighton Park,948.481468,3201 New Brighton Road,Vancouver,V5K 5J7,16.0,4.5,beaches|parks,Beaches|Parks,"beaches,parks,bicyclepaths,mountainbiking"
1,a74744ce4bb7ea2aa9f406ac8bff95d8,_BkYJw9plA2tYLoToa9mGQ,Il Giardino Italiano,78.531770,2901 E Hastings St,Vancouver,V5K 5J1,2.0,4.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
2,a74744ce4bb7ea2aa9f406ac8bff95d8,_UWGRNA9Jkbt72_3NXpNcw,Hastings Park,291.862881,2901 E Hastings Street,Vancouver,V5K 5J1,9.0,4.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
3,a74744ce4bb7ea2aa9f406ac8bff95d8,d4PXZQGnXuCE7akD7PQ_TA,Rupert Park Pitch & Putt,959.110609,3402 Charles Street,Vancouver,V5K 5H9,9.0,4.0,parks|golf,Parks|Golf,"beaches,parks,bicyclepaths,mountainbiking"
4,a74744ce4bb7ea2aa9f406ac8bff95d8,LBVCFvtVtDwzOc2R0EqwfA,Adanac Park,875.246088,1025 Boundary Road,Vancouver,V5K 4T2,1.0,3.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
...,...,...,...,...,...,...,...,...,...,...,...,...
95,aa1cbf97abc3cd50515bc54633a9bb2a,8snL4a-AN2JHWxgK62riUQ,Hyatt Regency Vancouver,585.993327,655 Burrard Street,Vancouver,V6C 2R7,302.0,4.0,hotels|venues,Hotels|Venues & Event Spaces,"hostels,hotels"
96,aa1cbf97abc3cd50515bc54633a9bb2a,E1FGWviFkHNH9NfnxAEaVw,Executive Hotel LeSoleil,416.380860,567 Hornby Street,Vancouver,V6C 2E8,88.0,4.5,hotels,Hotels,"hostels,hotels"
97,aa1cbf97abc3cd50515bc54633a9bb2a,KwMakQ-ozY7U7OySlRb-Kw,L'Hermitage Hotel,668.474308,788 Richards Street,Vancouver,V6B 3A4,149.0,4.5,hotels,Hotels,"hostels,hotels"
98,aa1cbf97abc3cd50515bc54633a9bb2a,1ZWwVOwu2BI7v7Hh-PlDGQ,Vancouver Marriott Pinnacle Downtown Hotel,651.654435,1128 West Hastings Street,Vancouver,V6E 4R5,185.0,4.0,hotels,Hotels,"hostels,hotels"


In [82]:
rolling_df['category_id'].value_counts()

category_id
hotels                               1144
parks                                 599
museums                               116
hotels|venues                         111
hostels                               107
hotels|bedbreakfast                    73
dog_parks                              62
parks|playgrounds                      48
beaches                                22
resorts|hotels|vacation_rentals        22
gardens|parks                          20
artmuseums                             19
museums|galleries                      19
catering|hotels                        18
landmarks|parks                        17
casinos|hotels|venues                  15
playgrounds|dog_parks                  15
beaches|parks                          15
hotels|divebars                        12
hiking|mountainbiking|bikerentals      12
waterparks|parks|playgrounds           11
parks|theater                          10
museums|galleries|venues                9
skate_parks           

In [83]:
rolling_df.shape
rolling_df.head(50)

Unnamed: 0,station_id,place_id,name,distance,address,city,postal,review_count,rating,category_id,category_name,query_categories
0,a74744ce4bb7ea2aa9f406ac8bff95d8,EByyWFFnSnrmQZct_j6MBg,New Brighton Park,948.481468,3201 New Brighton Road,Vancouver,V5K 5J7,16.0,4.5,beaches|parks,Beaches|Parks,"beaches,parks,bicyclepaths,mountainbiking"
1,a74744ce4bb7ea2aa9f406ac8bff95d8,_BkYJw9plA2tYLoToa9mGQ,Il Giardino Italiano,78.53177,2901 E Hastings St,Vancouver,V5K 5J1,2.0,4.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
2,a74744ce4bb7ea2aa9f406ac8bff95d8,_UWGRNA9Jkbt72_3NXpNcw,Hastings Park,291.862881,2901 E Hastings Street,Vancouver,V5K 5J1,9.0,4.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
3,a74744ce4bb7ea2aa9f406ac8bff95d8,d4PXZQGnXuCE7akD7PQ_TA,Rupert Park Pitch & Putt,959.110609,3402 Charles Street,Vancouver,V5K 5H9,9.0,4.0,parks|golf,Parks|Golf,"beaches,parks,bicyclepaths,mountainbiking"
4,a74744ce4bb7ea2aa9f406ac8bff95d8,LBVCFvtVtDwzOc2R0EqwfA,Adanac Park,875.246088,1025 Boundary Road,Vancouver,V5K 4T2,1.0,3.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
5,a74744ce4bb7ea2aa9f406ac8bff95d8,62f1QlwgVZr3pFHdjS542A,Callister Park,894.985004,2875 Oxford Street,Vancouver,V5K 1N6,1.0,3.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
6,a74744ce4bb7ea2aa9f406ac8bff95d8,w_JY6JHlUtDw09puGdEPxw,Atrium Inn,679.261964,2889 East Hastings Street,Vancouver,V5K 2A1,58.0,3.0,hotels|venues,Hotels|Venues & Event Spaces,"hostels,hotels"
7,a9f0b06d07f89b75e92c0cf686223aea,GlYICUHh-vsbdBsdjnRwMg,Harbour Green Park,415.206448,1199 W Cordova Street,Vancouver,V6C 0A1,15.0,4.5,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
8,a9f0b06d07f89b75e92c0cf686223aea,-fHqwIU4S-uD9O-ouzhE_A,Rainbow Park,847.369121,872 Richards Street,Vancouver,V6B 3B4,4.0,5.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
9,a9f0b06d07f89b75e92c0cf686223aea,KiRXFRtSzRbgU_55Nj845Q,Portal Park,428.915352,1099 W Hastings Street,Vancouver,V6E 4E2,1.0,5.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"


In [84]:
# Save the individual partitioned dataframes as .csv files

#rolling_df.to_csv('../data/yelp_vancouver_partition1.csv', index=False)  # Saved on 2023-10-20 evening for partition_1
#rolling_df.to_csv('../data/yelp_vancouver_partition2.csv', index=False)  # Saved on 2023-10-20 evening for partition_2
#rolling_df.to_csv('../data/yelp_vancouver_partition3.csv', index=False)  # Saved on 2023-10-21 for partition_3

In [76]:
# Calculations required to support breaking station_id list into partition sizes to maximize usage of Yelp's 500 daily call limit
# (resets at 6pm Mountain == midnight UTC) and 3 calls per station_id in 245 stations
partition_1_size = 122
partition_2_size = 36

partition_1 = station_ids_list[:partition_1_size]
partition_2 = station_ids_list[partition_1_size: partition_1_size + partition_2_size]
partition_3 = station_ids_list[partition_1_size + partition_2_size:]

print(len(partition_1))
print(len(partition_2))
print(len(partition_3))

total_ids_in_partitions = len(partition_1) + len(partition_2) + len(partition_3)
print(f"total ids in all partitions ({total_ids_in_partitions}) == stations_df.shape[0]: {total_ids_in_partitions == stations_df.shape[0]}")

122
36
87
total ids in all partitions (245) == stations_df.shape[0]: True


Parse through the response to get the POI (such as restaurants, bars, etc) details you want (ratings, name, location, etc)

Put your parsed results into a DataFrame

In [93]:
# Need to pull each of the 3 partitions from 3 .csv files, into a singular dataframe for Yelp
partition1_df = pd.read_csv('../data/yelp_vancouver_partition1.csv')
partition2_df = pd.read_csv('../data/yelp_vancouver_partition2.csv')
partition3_df = pd.read_csv('../data/yelp_vancouver_partition3.csv')

yelp_df = pd.concat([partition1_df, partition2_df, partition3_df], ignore_index=True)

In [94]:
yelp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6684 entries, 0 to 6683
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   station_id        6684 non-null   object 
 1   place_id          6684 non-null   object 
 2   name              6684 non-null   object 
 3   distance          6684 non-null   float64
 4   address           6644 non-null   object 
 5   city              6684 non-null   object 
 6   postal            6659 non-null   object 
 7   review_count      6684 non-null   float64
 8   rating            6684 non-null   float64
 9   category_id       6684 non-null   object 
 10  category_name     6684 non-null   object 
 11  query_categories  6684 non-null   object 
dtypes: float64(3), object(9)
memory usage: 626.8+ KB


In [95]:
yelp_df.head(10)

Unnamed: 0,station_id,place_id,name,distance,address,city,postal,review_count,rating,category_id,category_name,query_categories
0,00fa94ad698dc4a9e4d708d6fd32f294,kajMc2fkWKdzKJ1M4pm47Q,Stanley Park,978.386841,1166 Stanley Park Drive,Vancouver,V6G,1091.0,5.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
1,00fa94ad698dc4a9e4d708d6fd32f294,VoziJj_Fw67OtZtdDzrpQg,English Bay Beach Park,783.428693,1700 Beach Avenue,Vancouver,V6E 1V3,68.0,4.5,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
2,00fa94ad698dc4a9e4d708d6fd32f294,XHJTdq8QJp6_9oCj5hU85w,Vancouver Seawall,663.404115,,Vancouver,,101.0,5.0,hiking|parks,Hiking|Parks,"beaches,parks,bicyclepaths,mountainbiking"
3,00fa94ad698dc4a9e4d708d6fd32f294,EGZABxCmlA3PNwbSYXhLbA,Morton Park,426.335401,1800 Morton Avenue,Vancouver,V6G 1Z1,14.0,4.5,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
4,00fa94ad698dc4a9e4d708d6fd32f294,AVulOVkLG2LIRaOdOAmdlA,Lost Lagoon,328.531508,Lagoon Dr,Vancouver,V6G,18.0,4.5,parks|lakes,Parks|Lakes,"beaches,parks,bicyclepaths,mountainbiking"
5,00fa94ad698dc4a9e4d708d6fd32f294,2CxBAbnFIOfjRASbWcHC4w,Stanley Park 2nd Beach Picnic Area,715.128216,Ceperly 2nd Beach,Vancouver,V6G 3E2,8.0,3.5,beaches,Beaches,"beaches,parks,bicyclepaths,mountainbiking"
6,00fa94ad698dc4a9e4d708d6fd32f294,4563XS_PrPJivPv_R5sW3Q,Alexandra Park,721.708957,1755 Beach avenue,Vancouver,V6E 1V3,1.0,5.0,parks,Parks,"beaches,parks,bicyclepaths,mountainbiking"
7,00fa94ad698dc4a9e4d708d6fd32f294,kRl_c-eObP6vf3KbJ0fulw,Movies in the Park,701.08881,Stanley Park Dr,Vancouver,V6G,6.0,4.5,parks|arts,Parks|Arts & Entertainment,"beaches,parks,bicyclepaths,mountainbiking"
8,00fa94ad698dc4a9e4d708d6fd32f294,42Tg2jf217mRb_rqYpKAbw,Stanley Park Shuffleboard Court Area - Gated O...,395.498272,2000 W Georgia Street,Vancouver,V6G,1.0,3.0,dog_parks,Dog Parks,"beaches,parks,bicyclepaths,mountainbiking"
9,00fa94ad698dc4a9e4d708d6fd32f294,TBcn1EwTCv3EsF4SEI3s4w,Lovers Walk Trail,1287.4175,Lovers Walk,Vancouver,V6G,2.0,5.0,hiking|parks,Hiking|Parks,"beaches,parks,bicyclepaths,mountainbiking"


# Comparing Results

Which API provided you with more complete data? Provide an explanation. 

1. Look at the difference in number and granularity of categories (eye-ball):'
Yelp:  https://docs.developer.yelp.com/docs/resources-categories
Foursquare:  https://location.foursquare.com/places/docs/categories

2.  Look at the different types of APIs:
Yelp:
APIs (Yelp Fusion) Overview:  https://docs.developer.yelp.com/docs/fusion-intro
    - Can get Businesses, Reviews, Events, Available Categories, Brands, and Autocomplete (typeahead search service)
    - Out of the above, the "Businesses Search" (https://docs.developer.yelp.com/reference/v3_business_search) seems most useful
        - Limitation:  Won't return any businesses without reviews
        - Accepts lat & long
        - Has Max 50 limit
    - "Businesses Reviews" (https://docs.developer.yelp.com/reference/v3_business_reviews) could be useful, but only returns up to 3 review exercepts and does not return businesses without reviews, which could be limiting.
        -The fact it returns up to 3 reviews will skew our data because businesses that are very popular with more than 3 reviews won't be differentiable in the dataset.
        - It does not seem to return review ratings/numbers so it's not that easy or accurate to easily generate a sentiment, either, for use in a model
    - Events Search (https://docs.developer.yelp.com/reference/v3_events_search) could be useful to see if more events in the immediate station radius would impact how many 
        - TODO: Will need to determine if it gives all events across the year, or only within a certain timeframe around the request time (which is less useful for statistical model)
        - 50 limit may be limiting


Foursquare: 
APIs Overview:  https://location.foursquare.com/developer/reference/api-overview
    - Places API:  https://location.foursquare.com/developer/reference/places-api-overview#endpoints
        - Place Search (https://location.foursquare.com/developer/reference/place-search):  "Search for places in the FSQ Places database using a location and querying by name, category name, telephone number, taste label, or chain name. For example, search for "coffee" to get back a list of recommended coffee shops ... You may pass a location with your request by using one of the following options."
        - Place Details (https://location.foursquare.com/developer/reference/place-details):  "Retrieve comprehensive information and metadata for a FSQ Place using the fsq_id."
        - Place Photos (https://location.foursquare.com/developer/reference/place-photos):  "Retrieve photos for a FSQ Place using the fsq_id."
        - Place Tips (https://location.foursquare.com/developer/reference/place-tips):  "Retrieve tips for a FSQ Place using the fsq_id."
        - Place Match (https://location.foursquare.com/developer/reference/place-match): "Return the Foursquare record of a POI (via FSQ_ID) given a Name and Location. Provide a Location by using all the Address parameters, or by LL."
    - Studio Data API (geospatial assets - not useful to us for this exercise)
    - Geofence API (user-configured geofences - not useful to us for this exercise)

    - The Places API and MAYBE the Place Tips would be the most useful to us.  How many categories available would be next.  Max 50 limit in return string could be a limiting factor.

OVERALL:
- Yelp has better, richer data set (example:  'coffee' for station_id 1 maxxed over 50 results, whereas Foursquare '13032' for same location returned fewer results, within the 50 limit).
- But I am NOT going to use Yelp because the API limit call (500 per day) means I cannot break the categories apart (I will run out of calls) and am more likely to max out, so my dataset is questionable in its completeness.  Running a linear regression on "number of bikes available for rent at a given station, as a function of how many POIs within 1km radius of station" becomes problematic when numerous of the POI counts are at 50 but we don't know how much over 50.
- Because I had more calls on Foursquare available to me, I could make more calls for individual categories (especially breaking out the "Outdoor" and less likely to max out because I'm not combining multiple subcategories together).  This dataset is more reliable even though it doesn't have ratings.  I will be using the Foursquare dataset primarily for my regression/statistical model building.

Get the top 10 restaurants according to their rating