# Yelp Fusion API

With millions of business updates every month, Yelp Fusion delivers the most current and most accurate local data available. Choose from dozens of attributes per business, and as millions of new reviews and photos are added by active Yelp users, the Yelp data set remains unparalleled in its rich detail, freshness, and accuracy.

For our project, we will be pulling restaurant data for New York City based on the zip codes. These zip codes correspond to the housing price data downloaded from [Redfin](https://www.redfin.com/news/data-center/) and [Zillow](https://www.zillow.com/research/data/). 

We narrowed down the cities that had a minimum of 10 zip codes for our analysis. 
The following code for API was repeated 10 times to pull a total of 6014 zip codes

**Data Source** : [Yelp Fusion API](https://www.yelp.com/developers/v3/manage_app)
Note: This needs an authorized API key from Yelp Fusion. 

##### Yelp Fusion API limitations
Yelp API data limits - Yelp allows you to pull only 1,000 results at a time and only 50 per request with 5000 API pulls approved every 24 hours. 

##### Method
We created a list of zip codes for each borough from the housing price dataset. To get around the Yelp API limitation of 1,000 results at a time and only 50 per request, we adapted a function written by "rspiro9 on Yelp vs Inspection Analysis for NYC". We pulled 150 restaurants per zip code. 


### N This notebook was run 10  times  to pull a total of 6014 zip codes

In [1]:
import requests
import pandas as pd
import pickle

In [3]:
api_key = 'QLEWkFuExPpizCGBhUEJ1d7cN9nCUvbDLqPGhW1G39uQBk7ybMEhtkPjjOkwHAwjLZZbiMnjXCNRpuqTeve9Vr49gmC-zlJeSoq066tvRd2WxvgDhBO0Q1WWkLFBYnYx'

In [4]:
# Using the yelp business search API: https://www.yelp.com/developers/documentation/v3/business_search

# headers contain the api key.
headers = {'Authorization': 'Bearer {}'.format(api_key)}
url = 'https://api.yelp.com/v3/businesses/search'

In [7]:
## List of zip codes per NYC borough. Change this list for every borough. 
neighborhoods = ["zip codes go here"]

In [8]:
## Create temporary dataframe to hold data:
nyc = [[] for i in range(len(neighborhoods))] 

In [None]:
#Function to draw in data for each zipcode:
for x in range(len(neighborhoods)):
    print('---------------------------------------------')
    print('Gathering Data for {}'.format(neighborhoods[x]))
    print('---------------------------------------------')
    
    for y in range(5):
        location = neighborhoods[x]
        term = "Restaurants"
        search_limit = 50
        offset = 50 * y
        categories = "(restaurants, All)"
        sort_by = 'distance'
        url_params = {
                     'location': location.replace(' ', '+'),
                     'term' : term,
                      'limit': search_limit,
                     'offset': offset,
                     'categories': categories,
                     'sorty_by': sort_by
                     }
        
        response = requests.get(url, headers=headers, params=url_params)
        print('***** {} Restaurants #{} - #{} ....{}'.format(neighborhoods[x], 
                                                             offset+1, offset+search_limit,
                                                             response))
        nyc[x].append(response)

print(response)
print(type(response.text))
print(response.json().keys())
print(response.text[:1000])

In [None]:
# Check for any empty business lists:
for x in range(len(neighborhoods)):
    try: 
        for y in range(20):
            num = len(nyc[x][y].json()['businesses'])
            if num != 50:
                print(neighborhoods[x], y, num)
    except:
        print("Invalid data. Skipping entry...")
        pass

In [None]:
# Save the compiled data into dataframe and remove any empty data:
df = pd.DataFrame()
for x in range(len(neighborhoods)):
    try:
        for y in range(20):
            df_temp = pd.DataFrame.from_dict(nyc[x][y].json()['businesses'])
            if not df_temp.empty:
                df_temp.loc[:,'neighborhood'] = neighborhoods[x]
                df = df.append(df_temp)
    except:
        print("Invalid data. Skipping entry...")
        pass

In [12]:
df.shape

(58966, 17)

#### Saving file as pickle
* The pickle module keeps track of the objects it has already serialized, so that later references to the same object won’t be serialized again, thus allowing for faster execution time.
* Allows saving model in very little time.
* Good For small models with fewer parameters like the one we used.

Save each pull as a separate pickle file

In [13]:
# Save Dataset: (data pulled 3/28/22)
with open ('Yelp_API/data_pull_six.pickle','wb')as f:
    pickle.dump(df, f)