# Getting Your Data From Yelp!

In order to make sure you are on track to completing the project, you will complete this workbook first. Below are steps that you need to take in order to make sure you have your data from yelp and are ready to analyze it. Your cohort lead will review this workbook with you the Wednesday before your project is due.    

# Part 1 - Understanding your data and question

You will be pulling data from the Yelp API to complete your analysis. The API, however, provides you with a lot of information that will not be pertinent to your analysis. You will pull data from the API and parse through it to keep only the data that you will need. In order to help you identify that information,look at the API documentation and understand what data the API will provide you. 

Identify which data fields you will want to keep for your analysis. 

https://www.yelp.com/developers/documentation/v3/get_started

# Part 2 - Create ETL pipeline for the business data from the API

## Details

Now that you know what data you need from the API, you want to write code that will execute an API call, parse those results and then insert the results into the DB.  

It is helpful to break this up into three different functions (*API call, parse results, and insert into DB*) and then you can write a function/script that pull the other three functions together. 

Let's first do this for the Business endpoint.

## Request

### Imports and Setup

In [1]:
import requests
import pandas as pd
import json
import csv

!pip install pandas_profiling

import pandas_profiling

with open(r'C:\Users\bmcca\.secret\yelp_api.json') as f:
    keys = json.load(f)

client_id = keys['id']
yelp_key = keys['key']

ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

phik 0.11.2 requires scipy>=1.5.2, but you'll have scipy 1.5.0 which is incompatible.


Collecting pandas_profiling
  Downloading pandas_profiling-2.13.0-py2.py3-none-any.whl (245 kB)
Collecting tangled-up-in-unicode>=0.0.6
  Downloading tangled_up_in_unicode-0.1.0-py3-none-any.whl (3.1 MB)
Collecting htmlmin>=0.1.12
  Using cached htmlmin-0.1.12.tar.gz (19 kB)
Collecting phik>=0.11.1
  Downloading phik-0.11.2.tar.gz (1.1 MB)
Collecting confuse>=1.0.0
  Downloading confuse-1.4.0-py2.py3-none-any.whl (21 kB)
Collecting visions[type_image_path]==0.7.1
  Downloading visions-0.7.1-py3-none-any.whl (102 kB)
Collecting missingno>=0.4.2
  Using cached missingno-0.4.2-py3-none-any.whl (9.7 kB)
Collecting multimethod==1.4
  Downloading multimethod-1.4-py2.py3-none-any.whl (7.3 kB)
Collecting bottleneck
  Downloading Bottleneck-1.3.2.tar.gz (88 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel m

### ƒ: yelp_request

 - Params: search term (eg. "wineries); location; yelp_key variable (from Imports); and changing setting to print details

In [2]:
def yelp_request(term, location, yelp_key, verbose=True):
    '''Adapted from Yelp API Lab: https://github.com/BenJMcCarty/dsc-yelp-api-lab/tree/solution'''
    
    url = 'https://api.yelp.com/v3/businesses/search'

    headers = {
            'Authorization': 'Bearer {}'.format(yelp_key),
        }

    url_params = {
                    'term': term.replace(' ', '+'),
                    'location': location.replace(' ', '+'),
                    'limit': 5
                }
    response = requests.get(url, headers=headers, params=url_params)
    
    if verbose == True:
        print(response)
        print(type(response.text))
        print(response.text[:1000])
        
    return response.json()

### Sending the request and saving the response

- Un-comment the next line to run the response

In [3]:
response = yelp_request('winery','Southern California', yelp_key)
response.keys()

<Response [200]>
<class 'str'>
{"businesses": [{"id": "4N2pC-h3Ba6qUQ7cN4xP-g", "alias": "san-antonio-winery-ontario", "name": "San Antonio Winery", "image_url": "https://s3-media2.fl.yelpcdn.com/bphoto/__5IBirgkaHoRGdVnUIdng/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/san-antonio-winery-ontario?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=eyY_X0k6PMB2bu6EM1dXdw", "review_count": 248, "categories": [{"alias": "wineries", "title": "Wineries"}], "rating": 4.0, "coordinates": {"latitude": 34.022224, "longitude": -117.558985}, "transactions": [], "price": "$$", "location": {"address1": "2802 S Milliken Ave", "address2": null, "address3": "", "city": "Ontario", "zip_code": "91761", "country": "US", "state": "CA", "display_address": ["2802 S Milliken Ave", "Ontario, CA 91761"]}, "phone": "+19099473995", "display_phone": "(909) 947-3995", "distance": 26348.268371330654}, {"id": "Q0Cuh5XZY29AdrWpUv8yhw", "alias": 

dict_keys(['businesses', 'total', 'region'])

#### Saving/Loading as JSON for simplicity while iterating

In [9]:
# try:
#     with open(r'data\response.txt', 'w') as f:
#         json.dump(response, f)
# except IOError:
#     print("I/O error")

In [10]:
# with open('data/response.txt') as json_file:
#     data = json.load(json_file)

### Identifying and Exploring Keys

In [4]:
# Identify keys

print(response.keys())

dict_keys(['businesses', 'total', 'region'])


#### Exploring the "Businesses" Key

In [5]:
response['businesses']

[{'id': '4N2pC-h3Ba6qUQ7cN4xP-g',
  'alias': 'san-antonio-winery-ontario',
  'name': 'San Antonio Winery',
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/__5IBirgkaHoRGdVnUIdng/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/san-antonio-winery-ontario?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=eyY_X0k6PMB2bu6EM1dXdw',
  'review_count': 248,
  'categories': [{'alias': 'wineries', 'title': 'Wineries'}],
  'rating': 4.0,
  'coordinates': {'latitude': 34.022224, 'longitude': -117.558985},
  'transactions': [],
  'price': '$$',
  'location': {'address1': '2802 S Milliken Ave',
   'address2': None,
   'address3': '',
   'city': 'Ontario',
   'zip_code': '91761',
   'country': 'US',
   'state': 'CA',
   'display_address': ['2802 S Milliken Ave', 'Ontario, CA 91761']},
  'phone': '+19099473995',
  'display_phone': '(909) 947-3995',
  'distance': 26348.268371330654},
 {'id': 'Q0Cuh5XZY29AdrWpUv8yhw',
  '

In [6]:
# Show first item w/in list of businesses

response['businesses'][0]

{'id': '4N2pC-h3Ba6qUQ7cN4xP-g',
 'alias': 'san-antonio-winery-ontario',
 'name': 'San Antonio Winery',
 'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/__5IBirgkaHoRGdVnUIdng/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/san-antonio-winery-ontario?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=eyY_X0k6PMB2bu6EM1dXdw',
 'review_count': 248,
 'categories': [{'alias': 'wineries', 'title': 'Wineries'}],
 'rating': 4.0,
 'coordinates': {'latitude': 34.022224, 'longitude': -117.558985},
 'transactions': [],
 'price': '$$',
 'location': {'address1': '2802 S Milliken Ave',
  'address2': None,
  'address3': '',
  'city': 'Ontario',
  'zip_code': '91761',
  'country': 'US',
  'state': 'CA',
  'display_address': ['2802 S Milliken Ave', 'Ontario, CA 91761']},
 'phone': '+19099473995',
 'display_phone': '(909) 947-3995',
 'distance': 26348.268371330654}

In [7]:
response['businesses'][0]['categories'][0]['alias']

'wineries'

In [8]:
response['businesses'][0]['categories'][0]['title']

'Wineries'

#### Exploring the "Total" Key

In [9]:
response['total']

# How many businesses are there in total for my request?

493

#### Exploring the "Region" Key

In [11]:
response['region']

# From which geographical area will my results come?

{'center': {'longitude': -117.784423828125, 'latitude': 34.16815524147703}}

## Parse

In [12]:
response.keys()

dict_keys(['businesses', 'total', 'region'])

### ƒ: parse_data

In [13]:
def parse_data(list_of_data):
    '''Adapted from Tyrell's code'''  

    # Create empty list to store results
    
    parsed_data = []
    
    # Loop through each business in the list of businesses
    # Add specific k:v pairs to a dictionary
    # These pairs will be used to build a DF afterwards
    
    for business in list_of_data:
        if 'price' not in business:
            business['price'] = np.nan
            
            # Verify that the "price" key is in the selected business dict
            
        details = {'name': business['name'],
                     'location': ' '.join(business['location']['display_address']),
                     'id': business['id'],
                     #'categories': business['categories'],
                     'alias': business['categories'][0]['alias'],
                     'title': business['categories'][0]['title'],
                     'rating': business['rating'],
                     'review_count': business['review_count'],
                     'price': business['price'],
                     'latitude': business['coordinates']['latitude'],
                     'longitude': business['coordinates']['longitude']
                    }

        # If the "price" key is missing, then skip adding that key
        # This avoids an error when getting the desired info
        
#         else:
#             details = {'name': business['name'],
#                          'location': business['location']['display_address'],
#                          'id': business['id'],
#                          #'categories': business['categories'],
#                          'alias': business['categories'][0]['alias'],
#                          'title': business['categories'][0]['title'],
#                          'rating': business['rating'],
#                          'review_count': business['review_count'],
#                          'latitude': business['coordinates']['latitude'],
#                          'longitude': business['coordinates']['longitude']
#                         }
        
        # Add the new dictionary to the previous list
        
        parsed_data.append(details)
    
    # Adjust the 'location' value to be a single string (not 2 str in a list)
    
#     for biz in parsed_data:
#         biz['location'] = ' '.join(biz['location'])
        
    # Create a DataFrame from the resulting list
    
    df_parsed_data = pd.DataFrame(parsed_data)
    
#     df_parsed_data.dropna(inplace=True)
    
    return df_parsed_data

In [15]:
parsed_results = parse_data(response['businesses'])
parsed_results

Unnamed: 0,name,location,id,alias,title,rating,review_count,price,latitude,longitude
0,San Antonio Winery,"2802 S Milliken Ave Ontario, CA 91761",4N2pC-h3Ba6qUQ7cN4xP-g,wineries,Wineries,4.0,248,$$,34.022224,-117.558985
1,Joseph Filippi Winery,"12467 Baseline Rd Rancho Cucamonga, CA 91739",Q0Cuh5XZY29AdrWpUv8yhw,wineries,Wineries,4.0,158,$$,34.120926,-117.533615
2,Galleano Winery,"4231 Wineville Ave Mira Loma, CA 91752",dMy0GL2mG9hKIIMsfAQAgQ,winetastingroom,Wine Tasting Room,4.0,171,$$,34.0109,-117.54246
3,Third Street Wine Shop,"2142 3rd St La Verne, CA 91750",Liy2V5TissVcWFt9-45AKg,wine_bars,Wine Bars,5.0,101,$$,34.09999,-117.76946
4,The Claremont Packing House,"532 W 1st St Claremont, CA 91711",jJYkAV0acTPOkqESRWjmTA,winetastingroom,Wine Tasting Room,4.0,24,$$,34.094265,-117.721303


In [17]:
parsed_results.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   name          5 non-null      object 
 1   location      5 non-null      object 
 2   id            5 non-null      object 
 3   alias         5 non-null      object 
 4   title         5 non-null      object 
 5   rating        5 non-null      float64
 6   review_count  5 non-null      int64  
 7   price         5 non-null      object 
 8   latitude      5 non-null      float64
 9   longitude     5 non-null      float64
dtypes: float64(3), int64(1), object(6)
memory usage: 528.0+ bytes


In [19]:
type(parsed_results['price'][0])

str

In [16]:
# parsed_results.profile_report()

HBox(children=(HTML(value='Summarize dataset'), FloatProgress(value=0.0, max=23.0), HTML(value='')))




HBox(children=(HTML(value='Generate report structure'), FloatProgress(value=0.0, max=1.0), HTML(value='')))




ValueError: 
$$
^
Expected end of text, found '$'  (at char 0), (line:1, col:1)



In [22]:
# Identify totals of NaN values

parsed_results.isnull().sum()

name             0
location         0
id               0
alias            0
title            0
rating           0
review_count     0
price           13
latitude         0
longitude        0
dtype: int64

In [23]:
# Simple statistical exploration of quantitative values

parsed_results.describe()

Unnamed: 0,rating,review_count,latitude,longitude
count,50.0,50.0,50.0,50.0
mean,4.43,116.62,34.059168,-117.875246
std,0.505177,225.83766,0.301665,0.34668
min,3.0,1.0,33.559875,-118.687958
25%,4.0,8.25,33.816319,-118.133451
50%,4.5,48.0,34.060279,-117.900933
75%,5.0,118.75,34.149483,-117.677867
max,5.0,1392.0,34.6961,-117.039049


## Updating Requests for Pagination

### ƒ: yelp_request_offset

In [262]:
def yelp_request_offset(term, location, yelp_key, offset=0, verbose=False):
    '''Adapted from Yelp API Lab: https://github.com/BenJMcCarty/dsc-yelp-api-lab/tree/solution'''
    
    url = 'https://api.yelp.com/v3/businesses/search'

    headers = {
            'Authorization': 'Bearer {}'.format(yelp_key),
        }

    url_params = {
                    'term': term.replace(' ', '+'),
                    'location': location.replace(' ', '+'),
                    'limit': 50,
                    'offset': offset
                        }
    
    response = requests.get(url, headers=headers, params=url_params)
    
    if verbose == True:
        print(response)
        print(type(response.text))
        print(response.text[:1000])
        
    return response.json()

#### Test 1

In [25]:
test1 = yelp_request_offset('winery', 'San Diego', yelp_key, offset=0, verbose=False)
test1

{'businesses': [{'id': '76ADW8x8J_69qbtsc5F-2g',
   'alias': 'the-winery-restaurant-and-wine-bar-san-diego',
   'name': 'The Winery Restaurant & Wine Bar',
   'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/svRxHx6tJR7I1bUTqp29BA/o.jpg',
   'is_closed': False,
   'url': 'https://www.yelp.com/biz/the-winery-restaurant-and-wine-bar-san-diego?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=eyY_X0k6PMB2bu6EM1dXdw',
   'review_count': 492,
   'categories': [{'alias': 'bars', 'title': 'Bars'},
    {'alias': 'newamerican', 'title': 'American (New)'}],
   'rating': 4.0,
   'coordinates': {'latitude': 32.8724284, 'longitude': -117.2137748},
   'transactions': ['delivery', 'pickup'],
   'price': '$$',
   'location': {'address1': '4301 La Jolla Village Dr',
    'address2': 'Ste 2040',
    'address3': None,
    'city': 'San Diego',
    'zip_code': '92122',
    'country': 'US',
    'state': 'CA',
    'display_address': ['4301 La Jol

In [26]:
test1.keys()

dict_keys(['businesses', 'total', 'region'])

In [27]:
test1['total']

262

In [28]:
len(test1['businesses'])

50

#### Test 2

In [29]:
test2 = yelp_request_offset('winery', 'San Diego', yelp_key, offset=50, verbose=False)
test2

{'businesses': [{'id': 'cPnZzKMVRXUBEx7oCB9ABA',
   'alias': 'vintage-wines-limited-san-diego',
   'name': 'Vintage Wines Limited',
   'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/JfG1-qrEULE2GAZwUfyZbQ/o.jpg',
   'is_closed': False,
   'url': 'https://www.yelp.com/biz/vintage-wines-limited-san-diego?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=eyY_X0k6PMB2bu6EM1dXdw',
   'review_count': 40,
   'categories': [{'alias': 'beer_and_wine', 'title': 'Beer, Wine & Spirits'},
    {'alias': 'wine_bars', 'title': 'Wine Bars'},
    {'alias': 'winetastingroom', 'title': 'Wine Tasting Room'}],
   'rating': 4.5,
   'coordinates': {'latitude': 32.8792, 'longitude': -117.16743},
   'transactions': [],
   'price': '$$',
   'location': {'address1': '6904 Miramar Rd',
    'address2': 'Ste 201',
    'address3': '',
    'city': 'San Diego',
    'zip_code': '92121',
    'country': 'US',
    'state': 'CA',
    'display_address': ['6904

#### Loop

In [30]:
results = yelp_request_offset('winery', 'San Diego', yelp_key, offset=0, verbose=False)

num_pages = results['total']//50+1

print(results['total'])
print(len(results['businesses']))
print(num_pages)

262
50
6


In [32]:
cur = 50
parsed_results_dfs = []

for num in range(num_pages):
    results = yelp_request_offset('winery', 'San Diego', yelp_key, offset=cur, verbose=False)
    parsed_results = parse_data(results['businesses'])
    parsed_results_dfs.append(parsed_results)
    cur += 50
    

In [33]:
parsed_results_dfs

[                                      name  \
 0                    Vintage Wines Limited   
 1               Baja California Wine Tours   
 2                           The Wine Lover   
 3                      Chuparosa Vineyards   
 4                 Aall In Limo & Party Bus   
 5                               Vino Carta   
 6               Principe di Tricase Winery   
 7                              Baja Xpress   
 8                         Serpentine Cider   
 9                       Quigley Fine Wines   
 10                      Continental Bistro   
 11                     Aha Baja Wine Tours   
 12                   Hatfield Creek Winery   
 13            Poochie's Hooch Urban Cidery   
 14                     Record Family Wines   
 15               Dulzura Vineyard & Winery   
 16  Bottlecraft Beer Shop and Tasting Room   
 17                           Wine Smarties   
 18                           Cheval Winery   
 19                   HotShots Winery Tours   
 20          

In [34]:
df_concat = pd.concat(parsed_results_dfs, ignore_index=True)
df_concat

Unnamed: 0,name,location,id,alias,title,rating,review_count,price,latitude,longitude
0,Vintage Wines Limited,"6904 Miramar Rd Ste 201 San Diego, CA 92121",cPnZzKMVRXUBEx7oCB9ABA,beer_and_wine,"Beer, Wine & Spirits",4.5,40,$$,32.879200,-117.167430
1,Baja California Wine Tours,"San Diego, CA 92110",-R9x8f3TaNek2JIP-4OHCA,winetours,Wine Tours,5.0,10,,32.767720,-117.194830
2,The Wine Lover,"3968 5th Ave San Diego, CA 92103",xsnS_WeHv2QYOK2N1PKmTA,wine_bars,Wine Bars,4.0,205,$$,32.749550,-117.160780
3,Chuparosa Vineyards,"910 Gem Ln Ramona, CA 92065",D9rtDihupzA39yBh0YcAXg,wineries,Wineries,5.0,39,$$,33.009689,-116.858495
4,Aall In Limo & Party Bus,"655 W Broadway Ste 1300 San Diego, CA 92101",pnZcC9NQxldBQyK9ccNVOQ,limos,Limos,5.0,162,,32.715375,-117.168598
...,...,...,...,...,...,...,...,...,...,...
142,San Diego Limobuses,"3333 Midway Dr Ste 206 San Diego, CA 92110",SCaFGyzrTGTI6aQHhLxbgA,limos,Limos,3.0,46,,32.750060,-117.211380
143,Alpine Discount Liquor,"2223 Alpine Blvd Alpine, CA 91901",-ARx5ShNxJgjyahKnikTnA,beer_and_wine,"Beer, Wine & Spirits",3.5,3,$,32.835287,-116.765910
144,Village Wine & Spirits,"1552 Encinitas Blvd Encinitas, CA 92024",XkGnb-YxP5MK_ok1X011RA,beer_and_wine,"Beer, Wine & Spirits",4.0,24,$$,33.045896,-117.255584
145,The Destination Wedding Group,"Escondido, CA 92033",lJwxe_fjdt-e8xPrDq63fA,wedding_planning,Wedding Planning,5.0,18,,33.123470,-117.086520


## Save

### ƒ: df_save

Save parsed results to a .csv file

In [263]:
df.shape

(82, 10)

In [35]:
df_concat.to_csv('data\wineries.csv')

In [36]:
df = pd.read_csv("data/wineries.csv", index_col=0)
df

Unnamed: 0,name,location,id,alias,title,rating,review_count,price,latitude,longitude
0,Vintage Wines Limited,"6904 Miramar Rd Ste 201 San Diego, CA 92121",cPnZzKMVRXUBEx7oCB9ABA,beer_and_wine,"Beer, Wine & Spirits",4.5,40,$$,32.879200,-117.167430
1,Baja California Wine Tours,"San Diego, CA 92110",-R9x8f3TaNek2JIP-4OHCA,winetours,Wine Tours,5.0,10,,32.767720,-117.194830
2,The Wine Lover,"3968 5th Ave San Diego, CA 92103",xsnS_WeHv2QYOK2N1PKmTA,wine_bars,Wine Bars,4.0,205,$$,32.749550,-117.160780
3,Chuparosa Vineyards,"910 Gem Ln Ramona, CA 92065",D9rtDihupzA39yBh0YcAXg,wineries,Wineries,5.0,39,$$,33.009689,-116.858495
4,Aall In Limo & Party Bus,"655 W Broadway Ste 1300 San Diego, CA 92101",pnZcC9NQxldBQyK9ccNVOQ,limos,Limos,5.0,162,,32.715375,-117.168598
...,...,...,...,...,...,...,...,...,...,...
142,San Diego Limobuses,"3333 Midway Dr Ste 206 San Diego, CA 92110",SCaFGyzrTGTI6aQHhLxbgA,limos,Limos,3.0,46,,32.750060,-117.211380
143,Alpine Discount Liquor,"2223 Alpine Blvd Alpine, CA 91901",-ARx5ShNxJgjyahKnikTnA,beer_and_wine,"Beer, Wine & Spirits",3.5,3,$,32.835287,-116.765910
144,Village Wine & Spirits,"1552 Encinitas Blvd Encinitas, CA 92024",XkGnb-YxP5MK_ok1X011RA,beer_and_wine,"Beer, Wine & Spirits",4.0,24,$$,33.045896,-117.255584
145,The Destination Wedding Group,"Escondido, CA 92033",lJwxe_fjdt-e8xPrDq63fA,wedding_planning,Wedding Planning,5.0,18,,33.123470,-117.086520


#### Old Stuff

In [None]:
# from helper_funcs import *

In [None]:


# # create a variable  to keep track of which result you are in. 
# cur = 0

# #set up a while loop to go through and grab the result 
# while cur < num and cur < 1000:
#     #set the offset parameter to be where you currently are in the results 

#     #make your API call with the new offset number
#     results = yelp_call_offset('winery', 'San Diego', yelp_key, offset=cur, verbose=False)
    
#     #after you get your results you can now use your function to parse those results
#     parsed_results = parse_results(results)
    
#     # use your function to insert your parsed results into the db
#     df_save(parsed_results)
#     #increment the counter by 50 to move on to the next results
#     cur += 50

## Edit and Condense

Goal: condense the details down to specific functions (perhaps one function?) to pull, clean, and save data.

While it will take some experimentation to write the functions above, once you get them working it will be best to put them in a `.py` file and then import the functions to use in a script 

# ƒ: GET FULL DATA

In [37]:
results = yelp_request_offset('winery', 'San Diego', yelp_key, offset=0, verbose=False)

num_pages = results['total']//50+1

print(results['total'])
print(len(results['businesses']))
print(num_pages)

262
50
6


In [40]:
def get_full_data(term, location, yelp_key, file_name = 'data/wineries.csv'):
    '''Requests all results from Yelp API; saves as a .csv; and returns a DataFrame.'''
    blank_df = pd.DataFrame()
    blank_df.to_csv(file_name)
    
    # Process first request to Yelp API and calculate number of pages 
    results = yelp_request_offset(term, location, yelp_key, offset=0, verbose=False)
    num_pages = results['total']//50+1
    
    # Print out confirmation feedback
    print(f'For {term} and {location}: ')
    print(f"    Total number of results: {results['total']}.")
    print(f'    Total number of pages: {num_pages}.')
    
    # Create offset for further results and create empty list
    cur = 0
    parsed_results_dfs = []

    # Retrieves remaining pages
    for num in range(num_pages-1):
        try:
            results = yelp_request_offset(term, location, yelp_key, offset=cur, verbose=False)
            parsed_results = parse_data(results['businesses'])
            parsed_results_dfs.append(parsed_results)
            cur += 50
        except:
            print(f'Error on page {num}.')
            parsed_results_dfs.to_csv(file_name, mode='a')

    # Concatenate DataFrames and save to .csv
    df_concat = pd.concat(parsed_results_dfs, ignore_index=True)

    try:
        df_concat.to_csv(file_name)
        print(f'Saved to {file_name}.')
    except:
        print(f'Error, did not save.')
        
    return df_concat

In [167]:
df2 = get_full_data('winery', 'San Diego', yelp_key)
df2

For winery and San Diego: 
    Total number of results: 262.
    Total number of pages: 6.
Saved to data/wineries.csv.


Unnamed: 0,name,location,id,alias,title,rating,review_count,price,latitude,longitude
0,The Winery Restaurant & Wine Bar,"4301 La Jolla Village Dr Ste 2040 San Diego, C...",76ADW8x8J_69qbtsc5F-2g,bars,Bars,4.0,492,$$,32.872428,-117.213775
1,Baja Winery Tours,"4629 Cass St San Diego, CA 92109",vVaNDvLrCCE_Cw_DyPnBpA,winetours,Wine Tours,5.0,66,,32.798916,-117.252111
2,Bernardo Winery,"13330 Paseo Del Verano Norte San Diego, CA 92128",DknnpiG1p4OoM1maFshzXA,winetastingroom,Wine Tasting Room,4.5,626,$$,33.032800,-117.046460
3,Callaway Vineyard & Winery,"517 4th Ave Ste 101 San Diego, CA 92101",Cn2_bpTngghYW1ej4zreZg,winetastingroom,Wine Tasting Room,5.0,100,$$,32.710751,-117.160918
4,Négociant Winery,"2419 El Cajon Blvd San Diego, CA 92104",Cc1sQWRWgGyMCjzX2mmMQQ,winetastingroom,Wine Tasting Room,4.5,103,$$,32.754880,-117.138280
...,...,...,...,...,...,...,...,...,...,...
192,San Diego Limobuses,"3333 Midway Dr Ste 206 San Diego, CA 92110",SCaFGyzrTGTI6aQHhLxbgA,limos,Limos,3.0,46,,32.750060,-117.211380
193,Alpine Discount Liquor,"2223 Alpine Blvd Alpine, CA 91901",-ARx5ShNxJgjyahKnikTnA,beer_and_wine,"Beer, Wine & Spirits",3.5,3,$,32.835287,-116.765910
194,Village Wine & Spirits,"1552 Encinitas Blvd Encinitas, CA 92024",XkGnb-YxP5MK_ok1X011RA,beer_and_wine,"Beer, Wine & Spirits",4.0,24,$$,33.045896,-117.255584
195,The Destination Wedding Group,"Escondido, CA 92033",lJwxe_fjdt-e8xPrDq63fA,wedding_planning,Wedding Planning,5.0,18,,33.123470,-117.086520


In [168]:
# print(df2['name'].to_string())

In [169]:
# df3 = get_full_data('pizza', 'Baltimore', yelp_key)

In [170]:
# df3

## Cleaning Data

In [171]:
# df2.isna().sum()

In [172]:
# df2[df2['price'].isna()]

## Identifying top 3 aliases

In [186]:
# Create new DF showing only alias and title columns
df2_alias = df2.loc[:,['alias', 'title']]

# Identify top 2 aliases 
df2_alias_count = df2_alias.groupby('alias').count().sort_values(['title'],\
                                                        ascending=False)[:2]
df2_alias_count.reset_index(inplace=True)

print(df2_alias_count)

# display them as a list
aliases_top_3 = df2_alias_count['alias'].tolist()
print(aliases_top_3)

# Note: initially tried top 3, but it returned distributors, not wineries

             alias  title
0         wineries     49
1  winetastingroom     33
['wineries', 'winetastingroom']


In [187]:
# Selecting rows based on condition

df3 = df2[df2['alias'].isin(aliases_top_3)]
df3

Unnamed: 0,name,location,id,alias,title,rating,review_count,price,latitude,longitude
2,Bernardo Winery,"13330 Paseo Del Verano Norte San Diego, CA 92128",DknnpiG1p4OoM1maFshzXA,winetastingroom,Wine Tasting Room,4.5,626,$$,33.032800,-117.046460
3,Callaway Vineyard & Winery,"517 4th Ave Ste 101 San Diego, CA 92101",Cn2_bpTngghYW1ej4zreZg,winetastingroom,Wine Tasting Room,5.0,100,$$,32.710751,-117.160918
4,Négociant Winery,"2419 El Cajon Blvd San Diego, CA 92104",Cc1sQWRWgGyMCjzX2mmMQQ,winetastingroom,Wine Tasting Room,4.5,103,$$,32.754880,-117.138280
5,San Pasqual Winery - Seaport Village,"805 W Harbor Dr San Diego, CA 92101",gMW1RvyLu90RSQAY9UrIHw,winetastingroom,Wine Tasting Room,4.5,138,$$,32.708732,-117.168195
6,Domaine Artefact Vineyard & Winery,"15404 Highland Valley Rd Escondido, CA 92025",WqVbxY77Ag96X90LultCUw,wineries,Wineries,5.0,96,$$,33.068170,-117.001600
...,...,...,...,...,...,...,...,...,...,...
140,Roll OutThe Barrell Charity Event by Meritage,"162 S Rancho Santa Fe Rd Encinitas, CA 92024",wyLm9fIoamN-VALcu3nUVg,wineries,Wineries,4.0,1,,33.037121,-117.238654
145,Licores Kentucky,Calle Puerto y 3ra S/N Col. Centro 22000 Tijua...,B7gID-M2EsdpthrTcwTNYA,wineries,Wineries,5.0,1,,32.534236,-117.034976
148,Barrica 9,Av. Revolución 1265 Col. Zona Centro 22000 Tij...,HxTqmzT4G43iAKXrB3pqQg,winetastingroom,Wine Tasting Room,4.5,7,$$,32.530430,-117.036500
165,"RL Liquid Assets, Inc","5909 Sea Lion Pl Ste G Carlsbad, CA 92010",-STecUUsS69EMSE7PxwPwA,wineries,Wineries,3.0,2,,33.134743,-117.248093


In [206]:
df3.to_csv('data/filtered_aliases.csv')

In [207]:
df_saved = pd.read_csv("data/filtered_aliases.csv", index_col=0)
df_saved

Unnamed: 0,name,location,id,alias,title,rating,review_count,price,latitude,longitude
2,Bernardo Winery,"13330 Paseo Del Verano Norte San Diego, CA 92128",DknnpiG1p4OoM1maFshzXA,winetastingroom,Wine Tasting Room,4.5,626,$$,33.032800,-117.046460
3,Callaway Vineyard & Winery,"517 4th Ave Ste 101 San Diego, CA 92101",Cn2_bpTngghYW1ej4zreZg,winetastingroom,Wine Tasting Room,5.0,100,$$,32.710751,-117.160918
4,Négociant Winery,"2419 El Cajon Blvd San Diego, CA 92104",Cc1sQWRWgGyMCjzX2mmMQQ,winetastingroom,Wine Tasting Room,4.5,103,$$,32.754880,-117.138280
5,San Pasqual Winery - Seaport Village,"805 W Harbor Dr San Diego, CA 92101",gMW1RvyLu90RSQAY9UrIHw,winetastingroom,Wine Tasting Room,4.5,138,$$,32.708732,-117.168195
6,Domaine Artefact Vineyard & Winery,"15404 Highland Valley Rd Escondido, CA 92025",WqVbxY77Ag96X90LultCUw,wineries,Wineries,5.0,96,$$,33.068170,-117.001600
...,...,...,...,...,...,...,...,...,...,...
140,Roll OutThe Barrell Charity Event by Meritage,"162 S Rancho Santa Fe Rd Encinitas, CA 92024",wyLm9fIoamN-VALcu3nUVg,wineries,Wineries,4.0,1,,33.037121,-117.238654
145,Licores Kentucky,Calle Puerto y 3ra S/N Col. Centro 22000 Tijua...,B7gID-M2EsdpthrTcwTNYA,wineries,Wineries,5.0,1,,32.534236,-117.034976
148,Barrica 9,Av. Revolución 1265 Col. Zona Centro 22000 Tij...,HxTqmzT4G43iAKXrB3pqQg,winetastingroom,Wine Tasting Room,4.5,7,$$,32.530430,-117.036500
165,"RL Liquid Assets, Inc","5909 Sea Lion Pl Ste G Carlsbad, CA 92010",-STecUUsS69EMSE7PxwPwA,wineries,Wineries,3.0,2,,33.134743,-117.248093


### Ignore - old code

In [180]:
# df2_alias_sorted = df2_alias_count.sort_values(['title'], ascending=False)
# df3_alias = pd.DataFrame(df2_alias_sorted)
# df3_alias

# df3_alias.groupby('alias').sum().sort_values(['title'], ascending=False)[:3]

# df_alias_top_3 = df3_alias.loc[:, 'title']#[:3]
# df_alias_top_3

In [177]:
# df2_alias = df2.loc[:,'alias']

# df2_alias_count = df2.groupby('alias').count()#.sort_values(['alias'], ascending=False)#[:3]
# df2_alias_count.reset_index(inplace=True)

# df2_alias_count

# Part 3 -  Create ETL pipeline for the restaurant review data from the API

You've done this for the Businesses, now you need to do this for reviews. You will follow the same process, but your functions will be specific to reviews. Above you have a model of the functions you will need to write, and how to pull them together in one script. For this part, you have the process below 

## Getting Business IDs

- In order to pull the reveiws, you will need the business ids. So your first step will be to get all of the business ids from your businesses csv. 

### Open file and slice ID

1. Open data/wineries.csv
2. Slice out the 'name' and 'id' columns for each row

In [212]:
df_saved = pd.read_csv("data/filtered_aliases.csv", index_col=0)
df_saved.reset_index(drop=True, inplace=True)
df_saved

Unnamed: 0,name,location,id,alias,title,rating,review_count,price,latitude,longitude
0,Bernardo Winery,"13330 Paseo Del Verano Norte San Diego, CA 92128",DknnpiG1p4OoM1maFshzXA,winetastingroom,Wine Tasting Room,4.5,626,$$,33.032800,-117.046460
1,Callaway Vineyard & Winery,"517 4th Ave Ste 101 San Diego, CA 92101",Cn2_bpTngghYW1ej4zreZg,winetastingroom,Wine Tasting Room,5.0,100,$$,32.710751,-117.160918
2,Négociant Winery,"2419 El Cajon Blvd San Diego, CA 92104",Cc1sQWRWgGyMCjzX2mmMQQ,winetastingroom,Wine Tasting Room,4.5,103,$$,32.754880,-117.138280
3,San Pasqual Winery - Seaport Village,"805 W Harbor Dr San Diego, CA 92101",gMW1RvyLu90RSQAY9UrIHw,winetastingroom,Wine Tasting Room,4.5,138,$$,32.708732,-117.168195
4,Domaine Artefact Vineyard & Winery,"15404 Highland Valley Rd Escondido, CA 92025",WqVbxY77Ag96X90LultCUw,wineries,Wineries,5.0,96,$$,33.068170,-117.001600
...,...,...,...,...,...,...,...,...,...,...
77,Roll OutThe Barrell Charity Event by Meritage,"162 S Rancho Santa Fe Rd Encinitas, CA 92024",wyLm9fIoamN-VALcu3nUVg,wineries,Wineries,4.0,1,,33.037121,-117.238654
78,Licores Kentucky,Calle Puerto y 3ra S/N Col. Centro 22000 Tijua...,B7gID-M2EsdpthrTcwTNYA,wineries,Wineries,5.0,1,,32.534236,-117.034976
79,Barrica 9,Av. Revolución 1265 Col. Zona Centro 22000 Tij...,HxTqmzT4G43iAKXrB3pqQg,winetastingroom,Wine Tasting Room,4.5,7,$$,32.530430,-117.036500
80,"RL Liquid Assets, Inc","5909 Sea Lion Pl Ste G Carlsbad, CA 92010",-STecUUsS69EMSE7PxwPwA,wineries,Wineries,3.0,2,,33.134743,-117.248093


In [213]:
# Slice out the elements from the "name" column

df_saved_name = df_saved['name'].to_list()
df_saved_name

['Bernardo Winery',
 'Callaway Vineyard & Winery',
 'Négociant Winery',
 'San Pasqual Winery - Seaport Village',
 'Domaine Artefact Vineyard & Winery',
 'Carruth Cellars Wine Garden',
 'Blue Door Urban Winery',
 'Cordiano Winery',
 'FruitCraft - Fermentery & Distillery',
 'Pali Wine Co',
 'LJ Crafted Wines - Wines & Tastings',
 'Speckle Rock Vineyards',
 'San Pasqual Winery',
 'Altipiano Vineyard & Winery',
 'Gianni Buonomo Vintners',
 'Koi Zen Cellars',
 'La Mesa Wine Works',
 'Highland Valley Vineyards',
 '80 Sips Around the World',
 'Granite Lion Cellars',
 'Three Hills Winery',
 'Abnormal Wine Company',
 "Woof'n Rose Winery and Vineyard",
 "Bastian's Vineyards",
 'Vineyard Grant James',
 'Rustic Ridge Vineyards',
 'Espinosa Vineyards and Winery',
 'Charlie & Echo',
 "Rose's Tasting Room",
 'So Cal Wines',
 'San Pasqual Winery Tasting Room & Gallery',
 'California Wine Line',
 'Chuparosa Vineyards',
 'Principe di Tricase Winery',
 'Quigley Fine Wines',
 'Hatfield Creek Winery',
 'Re

In [214]:
df_saved_id = df_saved['id'].to_list()
df_saved_id

['DknnpiG1p4OoM1maFshzXA',
 'Cn2_bpTngghYW1ej4zreZg',
 'Cc1sQWRWgGyMCjzX2mmMQQ',
 'gMW1RvyLu90RSQAY9UrIHw',
 'WqVbxY77Ag96X90LultCUw',
 'yZp9FdMH6Dmn98mfNInFHw',
 'ElE6Nj7iz-tNV4ebV6Clew',
 'Ub2bJsi7lIOQ9TyIKdHaJw',
 'sCET1pLdKNNPBQJyjPOkww',
 'fyh566YXm5XJ3Ntv_GLghg',
 '7i8AdZoDySoEdT_uoMMzNA',
 'ALTc2EAwGkWfVL9gpuGdqw',
 'Phqm2QvJE0uEqpxMvdzH_g',
 'NKsyDqmKiNWT4HtkBh6AdA',
 'o0flINbzGa8LDlM9Mtbfwg',
 'Oic9__gID0APdrBIjP4Xjw',
 'YVMiPOAPsEd1fXxNmpjEMA',
 '9qalQETuaZnFDgZrsEB7MA',
 'XH7njeu3FvhxoTqzHTTE5g',
 '1PS0mcUZkJPyG72pk3zwVg',
 'Ye4Pr0ZbC8sGPwpWqR5wvQ',
 'ptA-I2Kk78jydAeJIYs2Cg',
 'f62YFfiriHapBhadXfMquA',
 'BGfXAtLC5sHBWOKlcb17KA',
 'gFPq8yqFRd9caMgkO5HvpQ',
 '97NwIhRGPhv1c4DU9DMpPA',
 'wLEvfL3dKvMMPD17-lpcGw',
 '9YDsJ5OL1hkx_vqwlJhi_Q',
 'VksLDK5jI-O53y0fFoUlQQ',
 '0gGOIJo3MUVpuy8hAepCGQ',
 'ia3GRsrEDexiodnpXe0D6w',
 '5fkdGagUJDuc3axRWchzJQ',
 'D9rtDihupzA39yBh0YcAXg',
 'RdMY32yM5MRIztG9gzuDKw',
 'U1F6GFNI_nw4WawP546qHQ',
 'V3kR-0Nb6ISBQ8ebXZJeVQ',
 '3mk6Tl3irp7x8baYoLRu9w',
 

## Requesting Reviews

- Write a function that takes a business id and makes a call to the API for reviews.


### ƒ: yelp_id_offset

In [218]:
#Testing how to create URL properly for API request

biz_id = "DknnpiG1p4OoM1maFshzXA"
url = 'https://api.yelp.com/v3/businesses/' + biz_id + '/reviews'
url

'https://api.yelp.com/v3/businesses/DknnpiG1p4OoM1maFshzXA/reviews'

In [None]:
f'https /{biz_id}/reviews

In [None]:
for biz in biz_ids:
    reviews = reviewcall(biz)
    parsed_reveiws = parse_reviews(reviews, biz)
    

In [229]:
def yelp_id_offset(biz_id, yelp_key, offset=0, verbose=True):
    '''Adapted from Yelp API Lab: https://github.com/BenJMcCarty/dsc-yelp-api-lab/tree/solution'''
    
    url = 'https://api.yelp.com/v3/businesses/'+ f'{biz_id}' + '/reviews'

    headers = {
            'Authorization': 'Bearer {}'.format(yelp_key),
        }

    url_params = {
#                     'id': biz_id.replace(' ', '+'),
                    'limit': 50,
                    'offset': offset
                        }
    
    response = requests.get(url, headers=headers, params=url_params)
    
    if verbose == True:
        print(response)
        print(type(response.text))
        print(response.text[:1000])
        
        
    return response.json()

#### Test ID Offset

In [231]:
reviews1 = yelp_id_offset(biz_id, yelp_key, offset=0, verbose=True)
reviews1.keys()

<Response [200]>
<class 'str'>
{"reviews": [{"id": "_aZ_3xAVPVJOVC_LlL0GeQ", "url": "https://www.yelp.com/biz/bernardo-winery-san-diego-2?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&hrid=_aZ_3xAVPVJOVC_LlL0GeQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=eyY_X0k6PMB2bu6EM1dXdw", "text": "Debbie and Ronnie were wonderfully attentive to our needs. Had the best of the both red and white wines. Well worth spending an afternoon just sipping wines...", "rating": 5, "time_created": "2021-04-13 19:18:16", "user": {"id": "5dlWmmwLrkLsRD_Z0697OQ", "profile_url": "https://www.yelp.com/user_details?userid=5dlWmmwLrkLsRD_Z0697OQ", "image_url": null, "name": "Patrick T."}}, {"id": "_HFTJW4nRUe5lC8M1LLflA", "url": "https://www.yelp.com/biz/bernardo-winery-san-diego-2?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&hrid=_HFTJW4nRUe5lC8M1LLflA&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=eyY_X0k6PMB2bu6EM1dXdw", "text": "I haven't gone to the winery since pre covid

dict_keys(['reviews', 'total', 'possible_languages'])

In [261]:
print(len(reviews1['reviews']))

3


##### Explore Reviews1 Keys

In [232]:
reviews1.keys()

dict_keys(['reviews', 'total', 'possible_languages'])

In [234]:
print(type(reviews1['reviews']))

<class 'list'>


In [240]:
reviews1['reviews'][0]

{'id': '_aZ_3xAVPVJOVC_LlL0GeQ',
 'url': 'https://www.yelp.com/biz/bernardo-winery-san-diego-2?adjust_creative=eyY_X0k6PMB2bu6EM1dXdw&hrid=_aZ_3xAVPVJOVC_LlL0GeQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=eyY_X0k6PMB2bu6EM1dXdw',
 'text': 'Debbie and Ronnie were wonderfully attentive to our needs. Had the best of the both red and white wines. Well worth spending an afternoon just sipping wines...',
 'rating': 5,
 'time_created': '2021-04-13 19:18:16',
 'user': {'id': '5dlWmmwLrkLsRD_Z0697OQ',
  'profile_url': 'https://www.yelp.com/user_details?userid=5dlWmmwLrkLsRD_Z0697OQ',
  'image_url': None,
  'name': 'Patrick T.'}}

In [237]:
reviews1['reviews'][0].keys()

dict_keys(['id', 'url', 'text', 'rating', 'time_created', 'user'])

In [254]:
reviews1['reviews'][0]['user']

{'id': '5dlWmmwLrkLsRD_Z0697OQ',
 'profile_url': 'https://www.yelp.com/user_details?userid=5dlWmmwLrkLsRD_Z0697OQ',
 'image_url': None,
 'name': 'Patrick T.'}

### ƒ: Parse_Reviews

In [257]:
def parse_reviews(list_of_reviews):
    '''Adapted from Tyrell's code'''  

    # Create empty list to store results
    
    parsed_reviews = []
    
    # Loop through each review in the list of reviews
    # Add specific k:v pairs to a dictionary
    # These pairs will be used to build a DF afterwards
    
    print('Completed: create list ' + f'{parsed_reviews}')
    
    for review in list_of_reviews:
        details = {'Reviewer Name': review['user']['name'],
            'Review ID': review['id'],
            'Time Created': review['time_created'],
            'Review Rating': review['rating'],
            'Review Text': review['text']
            }
        
        # Add the new dictionary to the empty list
        
    parsed_reviews.append(details)
    
    print('Completed: append list')
    
    # Create a DataFrame from the resulting list
    
    df_parsed_reviews = pd.DataFrame(parsed_reviews)

    print('Created DF')
    
    return df_parsed_reviews

In [259]:
df_parsed_reviews = parse_reviews(reviews1['reviews'])
df_parsed_reviews

Completed: create list []
Completed: append list
Created DF


Unnamed: 0,Reviewer Name,Review ID,Time Created,Review Rating,Review Text
0,stephanie l.,AmyrN0N2SjWI_Uzy3Uq0jQ,2021-02-21 17:19:02,5,I can't believe I never been here before! It's...


### ƒ: Get_Full_Reviews

In [216]:
def get_reviews(biz_id, yelp_key):
    '''Requests all results from Yelp API; saves as a .csv; and returns a DataFrame.'''
    
    # Process first request to Yelp API and calculate number of pages 
    results = yelp_id_offset(biz_id, yelp_key, offset=0, verbose=False)
    num_pages = results['total']//50+1
    
    # Print out confirmation feedback
    print(f'For {biz_id}: ')
    print(f"    Total number of results: {results['total']}.")
    print(f'    Total number of pages: {num_pages}.')
    
    # Create offset for further results and create empty list
    cur = 50
    parsed_results_dfs = []

    # Retrieves remaining pages
    for num in range(num_pages-1):
        try:
            results = yelp_id_offset(biz_id, yelp_key, offset=cur, verbose=False)
            parsed_results = parse_data(results['businesses'])
            parsed_results_dfs.append(parsed_results)
            cur += 50
        except:
            print(f'Error on page {num}.')

    # Concatenate DataFrames and save to .csv
    df_concat = pd.concat(parsed_results_dfs, ignore_index=True)
   
    try:
        df_concat.to_csv('data/business_ids.csv')
        print(f'Saved to data/business_ids.csv.')
    except:
        print(f'Error, did not save.')
        
    return df_concat

## Parsing Reviews

- Write a function to parse out the relevant information from the reviews

### Adjust prior function

In [88]:
# def parse_data(list_of_data):
#     '''Adapted from Tyrell's code'''  


#     parsed_data = []
     
#     for business in list_of_data:
#         if 'price' in business:
#             details = {'name': business['name'],
#                          'location': business['location']['display_address'],
#                          'id': business['id'],
#                          #'categories': business['categories'],
#                          'alias': business['categories'][0]['alias'],
#                          'title': business['categories'][0]['title'],
#                          'rating': business['rating'],
#                          'review_count': business['review_count'],
#                          'price': business['price'],
#                          'latitude': business['coordinates']['latitude'],
#                          'longitude': business['coordinates']['longitude']
#                         }
#         else:
#             details = {'name': business['name'],
#                          'location': business['location']['display_address'],
#                          'id': business['id'],
#                          #'categories': business['categories'],
#                          'alias': business['categories'][0]['alias'],
#                          'title': business['categories'][0]['title'],
#                          'rating': business['rating'],
#                          'review_count': business['review_count'],
#                          'latitude': business['coordinates']['latitude'],
#                          'longitude': business['coordinates']['longitude']
#                         }
    
#         parsed_data.append(details)
    
#     for biz in parsed_data:
#         biz['location'] = ' '.join(biz['location'])
        
#     df_parsed_data = pd.DataFrame(parsed_data)
    
# #     df_parsed_data.dropna(inplace=True)
    
#     return df_parsed_data

## Saving Parsed Data

- Write a function to save the parse data into a csv file containing all of the reviews. 

### Adjust function to save to new file

In [87]:
# def get_full_data(term, location, yelp_key, file_name = 'data/wineries.csv'):
#     '''Requests all results from Yelp API; saves as a .csv; and returns a DataFrame.'''
    
#     # Process first request to Yelp API and calculate number of pages 
#     results = yelp_request_offset(term, location, yelp_key, offset=0, verbose=False)
#     num_pages = results['total']//50+1
    
#     # Print out confirmation feedback
#     print(f'For {term} and {location}: ')
#     print(f"    Total number of results: {results['total']}.")
#     print(f'    Total number of pages: {num_pages}.')
    
#     # Create offset for further results and create empty list
#     cur = 50
#     parsed_results_dfs = []

#     # Retrieves remaining pages
#     for num in range(num_pages-1):
#         try:
#             results = yelp_request_offset(term, location, yelp_key, offset=cur, verbose=False)
#             parsed_results = parse_data(results['businesses'])
#             parsed_results_dfs.append(parsed_results)
#             cur += 50
#         except:
#             print(f'Error on page {num}.')

#     # Concatenate DataFrames and save to .csv
#     df_concat = pd.concat(parsed_results_dfs, ignore_index=True)

#     try:
#         df_concat.to_csv(file_name)
#         print(f'Saved to {file_name}.')
#     except:
#         print(f'Error, did not save.')
        
#     return df_concat

## Combining Functions

- Combine the functions above into a single script  

# Part 4 -  Using python and pandas, write code to answer the questions below. 

**Reviews**

Which are the 5 most reviewed businesses in your dataset?

What is the highest rating received in your data set and how many businesses have that rating?

What percentage of businesses have a rating greater than or  4.5?

What percentage of businesses have a rating less than 3?

---

**Pricing**

What percentage of your businesses have a price label of one dollar sign? Two dollar signs? Three dollar signs? No dollar signs?

---

**Returing Reviews**

Return the text of the reviews for the most reviewed business. 

Find the highest rated business and return text of the most recent review. If multiple business have the same rating, select the business with the most reviews. 

Find the lowest rated business and return text of the most recent review.  If multiple business have the same rating, select the business with the least reviews. 

___

# Reference help

###  Pagination

Returning to the Yelp API, the [documentation](https://www.yelp.com/developers/documentation/v3/business_search) also provides us details regarding the API limits. These often include details about the number of requests a user is allowed to make within a specified time limit and the maximum number of results to be returned. In this case, we are told that any request has a maximum of 50 results per request and defaults to 20. Furthermore, any search will be limited to a total of 1000 results. To retrieve all 1000 of these results, we would have to page through the results piece by piece, retriving 50 at a time. Processes such as these are often refered to as pagination.

Now that you have an initial response, you can examine the contents of the json container. For example, you might start with ```response.json().keys()```. Here, you'll see a key for `'total'`, which tells you the full number of matching results given your query parameters. Write a loop (or ideally a function) which then makes successive API calls using the offset parameter to retrieve all of the results (or 5000 for a particularly large result set) for the original query. As you do this, be mindful of how you store the data. 

**Note: be mindful of the API rate limits. You can only make 5000 requests per day, and APIs can make requests too fast. Start prototyping small before running a loop that could be faulty. You can also use time.sleep(n) to add delays. For more details see https://www.yelp.com/developers/documentation/v3/rate_limiting.**

***Below is sample code that you can use to help you deal with the pagination parameter and bring all of the functions together.***


***Also, something might cause your code to break while it is running. You don't want to constantly repull the same data when this happens, so you should insert the data into the database as you call and parse it, not after you have all of the data***


In [None]:
# create a variable  to keep track of which result you are in. 
cur = 0

#set up a while loop to go through and grab the result 
while cur < num and cur < 1000:
    #set the offset parameter to be where you currently are in the results 
    url_params['offset'] = cur
    #make your API call with the new offset number
    results = yelp_call(url_params, api_key)
    
    #after you get your results you can now use your function to parse those results
    parsed_results = parse_results(results)
    
    # use your function to insert your parsed results into the db
    db_insert(parsed_results)
    #increment the counter by 50 to move on to the next results
    cur += 20