# Yelp API - Lab


## Introduction 

We've seen how the Yelp API works and how to create basic visualizations using Folium. It's time to put those skills to work in order to create a working map! Taking things a step further, you'll also independently explore how to perform pagination in order to retrieve a full results set from the Yelp API!

## Objectives

You will be able to: 

* Using pagination and multiple functions, gather large amounts of data from an API, parse the data and make sense of it with meaningful analysis
* Create maps using Folium

## Problem Introduction

You've now worked with some API calls, but we have yet to see how to retrieve a more complete dataset in a programmatic manner. Returning to the Yelp API, the [documentation](https://www.yelp.com/developers/documentation/v3/business_search) also provides us details regarding the API limits. These often include details about the number of requests a user is allowed to make within a specified time limit and the maximum number of results to be returned. In this case, we are told that any request has a maximum of 50 results per request and defaults to 20. Furthermore, any search will be limited to a total of 1000 results. To retrieve all 1000 of these results, we would have to page through the results piece by piece, retrieving 50 at a time. Processes such as these are often referred to as pagination.

In this lab, you will define a search and then paginate over the results to retrieve all of the results. You'll then parse these responses as a DataFrame (for further exploration) and create a map using Folium to visualize the results geographically.

## Part I - Make the Initial Request

Start by making an initial request to the Yelp API. Your search must include at least 2 parameters: **term** and **location**. For example, you might search for pizza restaurants in NYC. The term and location is up to you but make the request below.

In [1]:
import json

def get_keys(path):
    with open(path) as f:
        return json.load(f)

In [2]:
keys = get_keys("/Users/wvsharber/.secret/yelp_api.json")

api_key = keys['api_key']

In [16]:
import requests
term = 'coffee'
location = 'Seattle WA'
SEARCH_LIMIT = 1

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)

In [17]:
print(response)
print(type(response.text))
print(response.text[:1000])

<Response [200]>
<class 'str'>
{"businesses": [{"id": "FVzl8rDPiTWEtrNEuCu-Xg", "alias": "storyville-coffee-company-seattle-9", "name": "Storyville Coffee Company", "image_url": "https://s3-media3.fl.yelpcdn.com/bphoto/nJgiyjMZ7sglAtc5wyKSLQ/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/storyville-coffee-company-seattle-9?adjust_creative=GfrfRbyX2iYO3M0rvKOQeg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=GfrfRbyX2iYO3M0rvKOQeg", "review_count": 1590, "categories": [{"alias": "coffee", "title": "Coffee & Tea"}], "rating": 4.5, "coordinates": {"latitude": 47.6088847, "longitude": -122.3403957}, "transactions": ["pickup", "delivery"], "price": "$$", "location": {"address1": "94 Pike St", "address2": "Ste 34", "address3": "Pike Place Market", "city": "Seattle", "zip_code": "98101", "country": "US", "state": "WA", "display_address": ["94 Pike St", "Ste 34", "Pike Place Market", "Seattle, WA 98101"]}, "phone": "+12067805777", "display_phone": "(206) 78

In [19]:
response.json()['region']

{'center': {'longitude': -122.33551025390625, 'latitude': 47.62541904760501}}

## Pagination

Now that you have an initial response, you can examine the contents of the JSON container. For example, you might start with ```response.json().keys()```. Here, you'll see a key for `'total'`, which tells you the full number of matching results given your query parameters. Write a loop (or ideally a function) which then makes successive API calls using the offset parameter to retrieve all of the results (or 5000 for a particularly large result set) for the original query. As you do this, be mindful of how you store the data. Your final goal will be to reformat the data concerning the businesses themselves into a pandas DataFrame from the json objects.

**Note: be mindful of the API rate limits. You can only make 5000 requests per day and are also can make requests too fast. Start prototyping small before running a loop that could be faulty. You can also use time.sleep(n) to add delays. For more details see https://www.yelp.com/developers/documentation/v3/rate_limiting.**

In [7]:
# Your code here; use a function or loop to retrieve all the results from your original request

#write for loop that gets 50 results at a time until 3400 (may need to limit how quickly each get request is sent)
#include error processing
#process the business part of the response into a pandas DataFrame
#concatenate each 50 results onto the DataFrame



In [3]:
import pandas as pd
import requests
import time

#need to add if statement to pause before retrieving >1000 results

def get_yelp_results(searchterm, searchlocation):
    """This function returns the search results in a Pandas DataFrame
       from Yelp api based on a search term and search location"""
    term = searchterm
    location = searchlocation
    SEARCH_LIMIT = 50
    offset = 0
    
    url = 'https://api.yelp.com/v3/businesses/search'

    headers = {
        'Authorization': 'Bearer {}'.format(api_key),
        }

    url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT,
                'offset': offset
                }
    
    initialresponse = requests.get(url, headers=headers, params=url_params)
    businesses = pd.DataFrame.from_dict(initialresponse.json()['businesses'])
#   if initialresponse.json()['total'] <= 1000:
    while len(businesses) < initialresponse.json()['total'] and len(businesses) < 1000:
        offset += 50
        url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT,
                'offset': offset
                }
        response = requests.get(url, headers=headers, params=url_params)
        if response.status_code == 200:
            bus50 = pd.DataFrame.from_dict(response.json()['businesses'])
            businesses = pd.concat([businesses, bus50], ignore_index = True)
            print(offset, "businesses returned")
            time.sleep(0.5)

        elif response.status_code == 400:
            print("Reached max number of businesses returned")
            break 
        
        else:
            print("Error: unable to retrieve businesses. Server responded with status code", response.status_code)
            break
#     else:
#         for i in range(0, initialresponse.json()['total'], 1000): 
#             while len(businesses) < initialresponse.json()['total'] and len(businesses) <= i * 1000:
#                     offset += 50
#                     url_params = {
#                             'term': term.replace(' ', '+'),
#                             'location': location.replace(' ', '+'),
#                             'limit': SEARCH_LIMIT,
#                             'offset': offset
#                             }
#                     response = requests.get(url, headers=headers, params=url_params)
#                     if response.status_code == 200:
#                         bus50 = pd.DataFrame.from_dict(response.json()['businesses'])
#                         businesses = pd.concat([businesses, bus50], ignore_index = True)
#                         print(offset)
#                         time.sleep(0.5)

#                     else:
#                         print("Error: unable to retrieve businesses. Server responded with status code", response.status_code)
#                         break
#             time.sleep(60)
#             print("Waiting 1 min to retrieve next 1000 results")
    return businesses
        

In [21]:
tulsapizza = get_yelp_results(searchterm="pizza", searchlocation="Tulsa OK")

50 number of businesses returned
100 number of businesses returned
150 number of businesses returned
200 number of businesses returned
250 number of businesses returned
300 number of businesses returned
350 number of businesses returned


In [4]:
seattlecoffee = get_yelp_results("coffee", "Seattle WA")

50 businesses returned
100 businesses returned
150 businesses returned
200 businesses returned
250 businesses returned
300 businesses returned
350 businesses returned
400 businesses returned
450 businesses returned
500 businesses returned
550 businesses returned
600 businesses returned
650 businesses returned
700 businesses returned
750 businesses returned
800 businesses returned
850 businesses returned
900 businesses returned
950 businesses returned


## Exploratory Analysis

Take the restaurants from the previous question and do an initial exploratory analysis. At minimum, this should include looking at the distribution of features such as price, rating and number of reviews as well as the relations between these dimensions.

In [5]:
seattlecoffee.head()

Unnamed: 0,alias,categories,coordinates,display_phone,distance,id,image_url,is_closed,location,name,phone,price,rating,review_count,transactions,url
0,storyville-coffee-company-seattle-9,"[{'alias': 'coffee', 'title': 'Coffee & Tea'}]","{'latitude': 47.6088847, 'longitude': -122.340...",(206) 780-5777,1874.645904,FVzl8rDPiTWEtrNEuCu-Xg,https://s3-media3.fl.yelpcdn.com/bphoto/nJgiyj...,False,"{'address1': '94 Pike St', 'address2': 'Ste 34...",Storyville Coffee Company,12067805777,$$,4.5,1590,"[delivery, pickup]",https://www.yelp.com/biz/storyville-coffee-com...
1,half-and-half-doughnut-seattle,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...","{'latitude': 47.61425, 'longitude': -122.32478}",(206) 325-5509,1479.893518,I3R01dj_8HzFb57VBmt30Q,https://s3-media2.fl.yelpcdn.com/bphoto/DEnDz2...,False,"{'address1': '516 E PIke St', 'address2': None...",Half and Half Doughnut,12063255509,$,4.0,29,[],https://www.yelp.com/biz/half-and-half-doughnu...
2,seattle-coffee-works-seattle,"[{'alias': 'coffeeroasteries', 'title': 'Coffe...","{'latitude': 47.6101133, 'longitude': -122.340...",(206) 340-8867,1743.611549,RwCpDB4C7CCFQvFxH1ve1g,https://s3-media1.fl.yelpcdn.com/bphoto/nDzPH0...,False,"{'address1': '108 Pine St', 'address2': None, ...",Seattle Coffee Works,12063408867,$,4.5,855,[],https://www.yelp.com/biz/seattle-coffee-works-...
3,moore-coffee-shop-seattle,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...","{'latitude': 47.611705, 'longitude': -122.3412...",(206) 883-7044,1584.516419,qKkHnsG-f4-BhdkR6yRrgw,https://s3-media4.fl.yelpcdn.com/bphoto/HTHJUX...,False,"{'address1': '1930 2nd Ave', 'address2': None,...",Moore Coffee Shop,12068837044,$,4.5,1041,[],https://www.yelp.com/biz/moore-coffee-shop-sea...
4,starbucks-reserve-roastery-seattle-seattle,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...","{'latitude': 47.6140014481935, 'longitude': -1...",(206) 624-0173,1387.001522,6ZKNFPLWRIVWshUkMNlgng,https://s3-media1.fl.yelpcdn.com/bphoto/F8afyU...,False,"{'address1': '1124 Pike St', 'address2': '', '...",Starbucks Reserve Roastery Seattle,12066240173,$$,4.5,2782,[],https://www.yelp.com/biz/starbucks-reserve-roa...


In [6]:
prices = seattlecoffee['price'].dropna()

In [7]:
prices = pd.Series([len(x) for x in prices])

In [8]:
print(prices.mean(), prices.median(), prices.mode())

1.4162735849056605 1.0 0    1
dtype: int64


In [9]:
print(seattlecoffee['rating'].mean(), seattlecoffee['rating'].median(), seattlecoffee['rating'].mode())

4.139 4.0 0    4.0
dtype: float64


In [12]:
seattlecoffee['rating'].describe()

count    1000.000000
mean        4.139000
std         0.514727
min         2.000000
25%         4.000000
50%         4.000000
75%         4.500000
max         5.000000
Name: rating, dtype: float64


In [13]:
print(seattlecoffee['review_count'].mean(), seattlecoffee['review_count'].median(), seattlecoffee['review_count'].mode())

197.036 75.5 0    1
dtype: int64


In [14]:
seattlecoffee['review_count'].describe()

count    1000.000000
mean      197.036000
std       408.606081
min         1.000000
25%        25.000000
50%        75.500000
75%       194.250000
max      5994.000000
Name: review_count, dtype: float64

## Mapping

Look at the initial Yelp example and try and make a map using Folium of the restaurants you retrieved. Be sure to also add popups to the markers giving some basic information such as name, rating and price.

In [23]:
# Your code here
import folium

m = folium.Map(
    location = [47.62541904760501 , -122.33551025390625],
    zoom_start = 12)

m

In [26]:
#Add lat/long columns from coordinates column
seattlecoffee['latitude'] = seattlecoffee['coordinates'].apply(lambda x: x['latitude'])
seattlecoffee['longitude'] = seattlecoffee['coordinates'].apply(lambda x: x['longitude'])

In [27]:
seattlecoffee.head()

Unnamed: 0,alias,categories,coordinates,display_phone,distance,id,image_url,is_closed,location,name,phone,price,rating,review_count,transactions,url,latitude,longitude
0,storyville-coffee-company-seattle-9,"[{'alias': 'coffee', 'title': 'Coffee & Tea'}]","{'latitude': 47.6088847, 'longitude': -122.340...",(206) 780-5777,1874.645904,FVzl8rDPiTWEtrNEuCu-Xg,https://s3-media3.fl.yelpcdn.com/bphoto/nJgiyj...,False,"{'address1': '94 Pike St', 'address2': 'Ste 34...",Storyville Coffee Company,12067805777,$$,4.5,1590,"[delivery, pickup]",https://www.yelp.com/biz/storyville-coffee-com...,47.608885,-122.340396
1,half-and-half-doughnut-seattle,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...","{'latitude': 47.61425, 'longitude': -122.32478}",(206) 325-5509,1479.893518,I3R01dj_8HzFb57VBmt30Q,https://s3-media2.fl.yelpcdn.com/bphoto/DEnDz2...,False,"{'address1': '516 E PIke St', 'address2': None...",Half and Half Doughnut,12063255509,$,4.0,29,[],https://www.yelp.com/biz/half-and-half-doughnu...,47.61425,-122.32478
2,seattle-coffee-works-seattle,"[{'alias': 'coffeeroasteries', 'title': 'Coffe...","{'latitude': 47.6101133, 'longitude': -122.340...",(206) 340-8867,1743.611549,RwCpDB4C7CCFQvFxH1ve1g,https://s3-media1.fl.yelpcdn.com/bphoto/nDzPH0...,False,"{'address1': '108 Pine St', 'address2': None, ...",Seattle Coffee Works,12063408867,$,4.5,855,[],https://www.yelp.com/biz/seattle-coffee-works-...,47.610113,-122.340567
3,moore-coffee-shop-seattle,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...","{'latitude': 47.611705, 'longitude': -122.3412...",(206) 883-7044,1584.516419,qKkHnsG-f4-BhdkR6yRrgw,https://s3-media4.fl.yelpcdn.com/bphoto/HTHJUX...,False,"{'address1': '1930 2nd Ave', 'address2': None,...",Moore Coffee Shop,12068837044,$,4.5,1041,[],https://www.yelp.com/biz/moore-coffee-shop-sea...,47.611705,-122.341253
4,starbucks-reserve-roastery-seattle-seattle,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...","{'latitude': 47.6140014481935, 'longitude': -1...",(206) 624-0173,1387.001522,6ZKNFPLWRIVWshUkMNlgng,https://s3-media1.fl.yelpcdn.com/bphoto/F8afyU...,False,"{'address1': '1124 Pike St', 'address2': '', '...",Starbucks Reserve Roastery Seattle,12066240173,$$,4.5,2782,[],https://www.yelp.com/biz/starbucks-reserve-roa...,47.614001,-122.328058


In [33]:
#make markers
for index, row in seattlecoffee.iterrows():
    name = row['name']
    price = row['price']
    rating = row['rating']
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        tooltip = 'Cafe Info',
        popup = (f'{name}, Price: {price}, Rating: {rating}'),
        icon=folium.Icon(icon='info-sign')
    ).add_to(m)

In [34]:
m

## Summary

Nice work! In this lab, you've made multiple API calls to Yelp in order to paginate through a results set, performing some basic exploratory analysis and then creating a nice interactive map to display the results using Folium! Well done!