# API Pagination

You've now worked with some API calls, but we have yet to see how to retrieve a more complete dataset in a programmatic manner. Returning to the Yelp API, the [documentation](https://www.yelp.com/developers/documentation/v3/business_search) also provides us details regarding the API limits. These often include details about the number of requests a user is allowed to make within a specified time limit and the maximum number of results to be returned. In this case, we are told that any request has a maximum of 50 results per request and defaults to 20. Furthermore, any search will be limited to a total of 1000 results. To retrieve all 1000 of these results, we would have to page through the results piece by piece, retriving 50 at a time. Processes such as these are often refered to as pagination. Without further ado, let's take a look in practice.

In [4]:
import requests
import pandas as pd

In [5]:
# client_id = #your id here
# api_key = #your key here

term = 'Mexican'
location = 'Astoria NY'
SEARCH_LIMIT = 10

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)
print(response)
print(type(response.text))
print(response.text[:1000])

NameError: name 'api_key' is not defined

# Previewing the Results

As before, let's briefly investigate the top level strucutre of our JSON response.

In [6]:
response.json().keys()

NameError: name 'response' is not defined

Navigating down a level, we might be curious how many restaurants fit our criteria:

In [7]:
response.json()['total']

NameError: name 'response' is not defined

Now of those, we have only retrieved the first 10 results (the search limit we provided.) As the documentation describes, this defaults to 20, and can be up to 50. Observe:

In [8]:
print(len(response.json()['businesses']))

NameError: name 'response' is not defined

Recall that we can also turn this into our usual Pandas DataFrame:

In [9]:
df = pd.DataFrame(response.json()['businesses'])
print(len(df))
df.head()

NameError: name 'response' is not defined

We could easily change our request slightly to retrieve a larger number at a time.

In [10]:
# client_id = #your id here
# api_key = #your key here

term = 'Mexican'
location = 'Astoria NY'
SEARCH_LIMIT = 50

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)
print(response)
print(type(response.text))
print(response.text[:1000])

NameError: name 'api_key' is not defined

We still have the same number of matching results, but this time have been given the first 50:

In [11]:
response.json()['total']

NameError: name 'response' is not defined

In [12]:
print(len(response.json()['businesses']))

NameError: name 'response' is not defined

If we want to retrieve more of the results, we page through them by using the offset parameter:

In [13]:
# client_id = #your id here
# api_key = #your key here

term = 'Mexican'
location = 'Astoria NY'
SEARCH_LIMIT = 50

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT,
                'offset': 50
            }
response2 = requests.get(url, headers=headers, params=url_params)
print(response2)
print(type(response2.text))
print(response2.text[:1000])

NameError: name 'api_key' is not defined

# Practice

With that, you should have the basics to retrive the full result set!

## API Call Function

Use the example above to write a function to retrieve all of the results (up to the maximum 1000 provided by Yelp) for a given search. Your function should then return the results for these as a single Pandas dataframe.

In [None]:
#Your code here


### Pseudocode outline: 

The function should take in url paramaters in some form. From there, the first call should check the number of results and make successive API calls using the offset parameter to cycle through these results. Each response should be stored as a DataFrame which will then be stitched together.

Warning: Making too many API calls can lead to errors. Most APIs require you to slow down your requests to a manageable limit. Make sure to use the time.sleep() method from the time package to make some brief pauses (~5 seconds is more then sufficient) between successive calls.

## Neighborhood ______ Restaurants

Use your function above to retrieve all of the restaurants for a particular cuisine in a neighborhood of your choice.

In [None]:
#Your code here

## Exploratory Analysis

Take the restaurants from the previous question and do an intial exploratory analysis. At minimum, this should include looking at the distribution of features such as price, rating and number of reviews as well as the relations between these dimensions.

In [None]:
#Your code here

## Mapping

Look at the initial Yelp example and try and make a map using Folium of the restaurants you retrieved.

In [None]:
#Your code here