# Project: Yelp WAX

## Part 1 - Understanding your data and question

You will be pulling data from the Yelp API to complete your analysis. The api, however, provides you with a lot of information that will not be pertinent to your analysis. YOu will pull data from the api and parse through it to keep only the data that you will need. In order to help you identify that information,look at the API documentation and understand what data the api will provide you. 

Identify which data fields you will want to keep for your analysis. 

https://www.yelp.com/developers/documentation/v3/get_started

___

## Part 2 - Create ETL pipeline for the business data from the API

Now that you know what data you need from the API, you want to write code that will execute a api call, parse those results and then insert the results into the DB.  

It is helpful to break this up into three different functions (*api call, parse results, and insert into DB*) and then you can write a function/script that pull the other three functions together. 

Let's first do this for the Business endpoint.

WAX:
https://www.yelp.com/developers/documentation/v3/business_search
ETL - extract, transform, load

In [1]:
import requests
import json
import csv
import os
from helper_funcs import *


In [2]:
biz_url =  'https://api.yelp.com/v3/businesses/search'
rev_url = 'https://api.yelp.com/v3/businesses/{id}/reviews'
# GET https://api.yelp.com/v3/businesses/{id}/reviews

In [3]:
# what type of business do you want to search
# term = 'gym'
term = 'sushi'
#where do you want to perform this search
location = 'Brooklyn'
# what is your other parameter you want to search against
# categories = 'gyms'
categories = 'restaurants'
biz_filepath = '../data/biz_data.csv'
rev_filepath = '../data/rev_data.csv'

url_params = {
    "term": term.replace(' ', '+'),
    "location": location.replace(' ', '+'),
    "categories" : categories,
    "limit": 50,
}

In [4]:
# import os
# cwd = os.getcwd()
# cwd

In [5]:
# biz = yelp_call(biz_url, url_params)
# biz_parsed_ld = parse_biz_results_ld(biz['businesses'])
# csv_append(biz_filepath, bizs_parsed_ld)

In [6]:
max_results = 50
fetch_biz_data(term, location, categories, biz_filepath, max_results)

Deleted existing data.csv file.
Created new data.csv file and added headers:
['id', 'name', 'is_closed', 'review_count', 'zip_code', 'rating', 'price']
Downloading data - - Done.
Successfully gathered listings for 1000 businesses of type 'restaurants', search term 'sushi', in 'Brooklyn'.


___

## Part 3 -  Create ETL pipeline for the restaurant review data from the API

You've done this for the Businesses, now you need to do this for reviews. You will follow the same process, but your functions will be specific to reviews. Above you ahve a model of the functions you will need to write, and how to pull them together in one script. For this part, you ahve the process below 

- In order to pull the reviews, you will need the business ids. So your first step will be to get all of the business ids from your businesses csv. 

In [7]:
biz_imported_data = []
with open(biz_filepath, 'r', newline = '') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        biz_imported_data.append(row)

In [8]:
biz_ids = [biz['id'] for biz in biz_imported_data]

In [9]:
biz_ids

['SUHJfh7H2NiQx8c5OWoAyQ',
 'e3JALGueMcc2eLXRK8HPNQ',
 'LjAt-SP7BwIpp56TZHVsWA',
 'LjAt-SP7BwIpp56TZHVsWA',
 'LTfR_0NiS7OIRCfkaAJg9w',
 'LTfR_0NiS7OIRCfkaAJg9w',
 'KrJ6m_TkxBAPPSNH-G7rvQ',
 '0Lnp_fi3gI2bfJ8RcMDfjg',
 'AFt1Qcec4_JNr6PWpkRYyw',
 'EIcbGkl6bRtAi12zcjA7-A',
 '1yvEVWnJfodsReGHc8DuVQ',
 'T2fE7hGS83Ba-QNCOqbK4A',
 'Idx1__FUB7CqnhS4aQWGFQ',
 'ZrzSDDj54aUlPqn4MQMKeQ',
 'ldNAjLZ9sAM0PbPcJtz5Jg',
 'U3ysEBmvdZXKmTdc3MxhGA',
 '_lS5EcBhVur3zQ5wxryGYw',
 'MM5P9cKlzovYLcf5qf2SwQ',
 'MM5P9cKlzovYLcf5qf2SwQ',
 'B-lJd1eBLcLk1StNbgAe9w',
 'cL8TLMCbs2B-vgy3SFtPGg',
 '27ASh8-hTL5jp5d_WQGHxw',
 '27ASh8-hTL5jp5d_WQGHxw',
 '49k7dpa5cNgKM0TBb593IA',
 'kAM_S06FQlhtSSyrzrJb6w',
 'tT6teP3ZfCAfvdPdDoqG4A',
 'tT6teP3ZfCAfvdPdDoqG4A',
 '4UqmQ-zLrbICQFoDhr5XnQ',
 '4UqmQ-zLrbICQFoDhr5XnQ',
 'lQgMrqcMZghWd45VIhkUAA',
 '0nYfyel0UwlI1qXHOjTw0w',
 '_FICyzFLQxR7N62I6qU94A',
 'G4qZvheX74VwYvek7I8iQg',
 '91tQ4ToReVVCA5MTJi9gfw',
 'Xk2zYFCknLOIZRv8T9z-Dw',
 'HC18oDJ2svQoXIddP-wuyw',
 'EXEDl7BOLZksz3Uy_RVFhw',
 

- Write a function that takes a business id and makes a call to the API for reviews


In [10]:
biz_imported_data;

In [20]:
results = get_review(biz_ids[0])

In [12]:
test = [{'id': 'uxJ3BdwfACraBeU_Mf3ITw',
 'url': 'https://www.yelp.com/biz/harbor-fitness-brooklyn-5?adjust_creative=-pEsRVwee9viyT1bikndGw&hrid=uxJ3BdwfACraBeU_Mf3ITw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=-pEsRVwee9viyT1bikndGw',
 'text': 'My neighborhood gym, been a member for over a decade. Staff is always kind and helpful...I especially appreciate the help of Lisa Lekacos for being of great...',
 'rating': 5,
 'time_created': '2021-01-27 15:05:34',
 'user': {'id': 'dj3scSObNzg_4HQ-5ZqnwQ',
  'profile_url': 'https://www.yelp.com/user_details?userid=dj3scSObNzg_4HQ-5ZqnwQ',
  'image_url': None,
  'name': 'Edward C.'}}]

In [13]:
results['reviews'][0]

{'id': 'wy0xAIOx2RpsFIGyX8rQFg',
 'url': 'https://www.yelp.com/biz/sushi-yashin-brooklyn?adjust_creative=-pEsRVwee9viyT1bikndGw&hrid=wy0xAIOx2RpsFIGyX8rQFg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=-pEsRVwee9viyT1bikndGw',
 'text': 'Fresh, delicious, quality ingredients. Pickup only but fast preparation time. Would definitely order again. Great for south/central Brooklyn eaters',
 'rating': 5,
 'time_created': '2020-12-22 07:55:26',
 'user': {'id': '5BaFA6CNpQA675TgQVuV7w',
  'profile_url': 'https://www.yelp.com/user_details?userid=5BaFA6CNpQA675TgQVuV7w',
  'image_url': 'https://s3-media1.fl.yelpcdn.com/photo/RvZUtgKIgz0NID3rtzxW8g/o.jpg',
  'name': 'Hilary W.'}}

- Write a function to parse out the relevant information from the reviews

In [14]:
# parsed_rev_results_ld = test

In [15]:
# parsed_rev_results_ld = parse_rev_results_ld(test)

In [16]:
parsed_rev_results_ld = parse_rev_results_ld(results['reviews'])

- Write a function to save the parse data into a csv file containing all of the reviews. 

In [17]:
# csv_create(rev_filepath, parsed_rev_results_ld)
# csv_append(rev_filepath, parsed_rev_results_ld)

In [19]:
csv_create(rev_filepath, get_review(biz_ids))
for rev in biz_ids:
    csv_append(rev_filepath, get_review(biz_ids[0]))

KeyError: 0

- Combine the functions above into a single script  

___

## Part 4 -  Using python and pandas, write code to answer the questions below. 


- Which are the 5 most reviewed businesses in your dataset?
- What is the highest rating recieved in your data set and how many businesses have that rating?
- What percentage of businesses have a rating greater than or  4.5?
- What percentage of businesses have a rating less than 3?
- What percentage of your businesseshave a price label of one dollar sign? Two dollar signs? Three dollar signs? No dollar signs?
- Return the text of the reviews for the most reviewed business. 
- Find the highest rated business and return text of the most recent review. If multiple business have the same rating, select the business with the most reviews. 
- Find the lowest rated business and return text of the most recent review.  If multiple business have the same rating, select the business with the least reviews. 


In [None]:
df = pd.DataFrame(parsed_results, columns = ['Name', 'Zip Code', 'Rating'])
df.set_index('Name',inplace = True)

___

# Reference help

###  Pagination

Returning to the Yelp API, the [documentation](https://www.yelp.com/developers/documentation/v3/business_search) also provides us details regarding the API limits. These often include details about the number of requests a user is allowed to make within a specified time limit and the maximum number of results to be returned. In this case, we are told that any request has a maximum of 50 results per request and defaults to 20. Furthermore, any search will be limited to a total of 1000 results. To retrieve all 1000 of these results, we would have to page through the results piece by piece, retriving 50 at a time. Processes such as these are often refered to as pagination.

Now that you have an initial response, you can examine the contents of the json container. For example, you might start with ```response.json().keys()```. Here, you'll see a key for `'total'`, which tells you the full number of matching results given your query parameters. Write a loop (or ideally a function) which then makes successive API calls using the offset parameter to retrieve all of the results (or 5000 for a particularly large result set) for the original query. As you do this, be mindful of how you store the data. 

**Note: be mindful of the API rate limits. You can only make 5000 requests per day, and APIs can make requests too fast. Start prototyping small before running a loop that could be faulty. You can also use time.sleep(n) to add delays. For more details see https://www.yelp.com/developers/documentation/v3/rate_limiting.**