# Data Aggregation
As a first step in the Places to Go Demo, we will need static venue data to create reccomendations from. In production, our venue sources will be managed by a Web Scraper bot that will handle crawling social media and updating the list based on activity. For now, we will populate a static set of 2,500 locations, which will be sourced from 5 cities. 

The five cities that have been requested by the client for the demo are:
- **New York**
- **Scottsdale**
- **Miami**
- **Los Angeles**
- **Chicago**

We will use the Yelp API to gather the top 500 rated locations in each city. We will then feed the `name` and `categories` field of each response to the AI model, which will seek to associate each venue with a list of keywords.

We will need to take the following steps to achieve our task:
1. Gather JSON objects for top 500 locations in each city
2. Extract exhaustive list of all categories from the 2,500 locations
3. Provide list of ChatGPT and prompt it to create a list of 20 keywords for each archetype
4. Design prompt for associating businesses with keywords based on `name` and `categories` field 
5. Run list of 2,500 businesses and store results in a JSON file.

In [9]:
import os
from dotenv import load_dotenv
load_dotenv("../.env")

YELP_API_KEY = os.getenv("YELP_API_KEY")
TRIP_ADVISOR_API_KEY = os.getenv("TRIP_ADVISOR_API_KEY")

## 1. Gather JSON Data of Locations
We want to start by using the `/businesses/search` endpoint of the Yelp Fusion API to gather the top 500 rated locations in each of our 5 cities. We will store these responses directly in JSON files to retrieve for future steps.

In [14]:
SEARCH_TERMS = ["tour", "activity", "experience", "resturant", "bar", "nightclub", "explore", "adventure", "museum", "nature"]

In [16]:
import requests
from tqdm import tqdm
from typing import Optional

# List of cities to search
CITIES = ["New%20York%20City", "Scottsdale", "Miami", "Los%20Angeles", "Chicago"]
CITY_CODES = ["NYC", "SCOTTSDALE", "MIAMI", "LA", "CHICAGO"]

CITY_TO_CODE = dict(zip(CITIES, CITY_CODES))

# Yelp Fusion API URL
API_URL = "https://api.yelp.com/v3"
BUSINESS_SEARCH_ENDPOINT = "/businesses/search"

# Search Params For API Request
LIMIT = 50
SORT_BY = "best_match"
LOCALE = "en_US"

# Authorization
HEADERS = {
    "Authorization": "Bearer " + YELP_API_KEY,
}

def request_city_data(city: str):
    """Request data from Yelp API for a given city"""
    base_url = f"{API_URL}{BUSINESS_SEARCH_ENDPOINT}?location={city}&limit={LIMIT}&sort_by={SORT_BY}&local={LOCALE}"
    data = []
    for i, search_term in enumerate(SEARCH_TERMS):
        url = base_url + f"&term={search_term}"
        results = requests.get(url, headers=HEADERS).json()
        # Add the city code to the data
        for result in results['businesses']: result['city'] = city
        data.extend(results["businesses"])

        # Reset the cursor to not interrupt the tqdm progress bar
        print(f"Found {len(results['businesses'])} results for {city} with term {search_term}")
    
    # Log City Results
    print(f"Found {len(data)} results for {city}")
    return data

def aggregate_city_data():
    """Aggregate data from all cities"""
    data = []
    for city in tqdm(CITIES):
        data.extend(request_city_data(city))

    print("\r", end="")
    return data

In [17]:
activity_data = aggregate_city_data()

  0%|          | 0/5 [00:00<?, ?it/s]

Found 50 results for New%20York%20City with term tour
Found 50 results for New%20York%20City with term activity
Found 50 results for New%20York%20City with term experience
Found 50 results for New%20York%20City with term resturant
Found 50 results for New%20York%20City with term bar
Found 50 results for New%20York%20City with term nightclub
Found 50 results for New%20York%20City with term explore
Found 50 results for New%20York%20City with term adventure
Found 50 results for New%20York%20City with term museum


 20%|██        | 1/5 [00:07<00:30,  7.53s/it]

Found 50 results for New%20York%20City with term nature
Found 500 results for New%20York%20City
Found 50 results for Scottsdale with term tour
Found 50 results for Scottsdale with term activity
Found 50 results for Scottsdale with term experience
Found 50 results for Scottsdale with term resturant
Found 50 results for Scottsdale with term bar
Found 50 results for Scottsdale with term nightclub
Found 50 results for Scottsdale with term explore
Found 50 results for Scottsdale with term adventure
Found 50 results for Scottsdale with term museum


 40%|████      | 2/5 [00:14<00:21,  7.08s/it]

Found 50 results for Scottsdale with term nature
Found 500 results for Scottsdale
Found 50 results for Miami with term tour
Found 50 results for Miami with term activity
Found 50 results for Miami with term experience
Found 50 results for Miami with term resturant
Found 50 results for Miami with term bar
Found 50 results for Miami with term nightclub
Found 50 results for Miami with term explore
Found 50 results for Miami with term adventure
Found 50 results for Miami with term museum


 60%|██████    | 3/5 [00:22<00:15,  7.54s/it]

Found 50 results for Miami with term nature
Found 500 results for Miami
Found 50 results for Los%20Angeles with term tour
Found 50 results for Los%20Angeles with term activity
Found 50 results for Los%20Angeles with term experience
Found 50 results for Los%20Angeles with term resturant
Found 50 results for Los%20Angeles with term bar
Found 50 results for Los%20Angeles with term nightclub
Found 50 results for Los%20Angeles with term explore
Found 50 results for Los%20Angeles with term adventure
Found 50 results for Los%20Angeles with term museum


 80%|████████  | 4/5 [00:30<00:07,  7.84s/it]

Found 50 results for Los%20Angeles with term nature
Found 500 results for Los%20Angeles
Found 50 results for Chicago with term tour
Found 50 results for Chicago with term activity
Found 50 results for Chicago with term experience
Found 50 results for Chicago with term resturant
Found 50 results for Chicago with term bar
Found 50 results for Chicago with term nightclub
Found 50 results for Chicago with term explore
Found 50 results for Chicago with term adventure
Found 50 results for Chicago with term museum


100%|██████████| 5/5 [00:37<00:00,  7.59s/it]

Found 50 results for Chicago with term nature
Found 500 results for Chicago





In [22]:
import json
# APPEND New Locations to Location Data -- DANGEROUS

with open("../data/searched_location_data.json", "r", encoding="utf-8") as f:
    location_data = json.load(f)

total_locations = activity_data + location_data
total_location_ids = list(set([location['id'] for location in total_locations]))

locations = []
for _id in total_location_ids:
    for location in total_locations:
        if location['id'] == _id:
            locations.append(location)
            break

with open("../data/searched_location_data.json", "w", encoding="utf-8") as f:
    for loc in locations:
        loc['city_code'] = CITY_TO_CODE[loc['city']]
    json.dump(locations, f, ensure_ascii=False, indent=4)