# Exploring and Transforming JSON Schemas

## Do Now:
Here is some information about me:

* Name - Sean Abu Wilson
* Birthday - 02/06/1985
* Nicknames - Rod Woodson, Shirley Temple, Megaphone
* Interests - NBA, Justice Reform, Twitter, Fitness
---
* Name - Forest Polchow
* Birthday - 10/19/1992
* Nicknames - FoPo, Dorest, Polchi, Jungle
* Interests - Biking, Climbing, Foreign Movies

Take this data about us and create a python data structure that you think would best be used to store this data.


In [None]:
[{'name': 'Sean Abu Wilson',
 'DOB': TS,
 'Nicknames':['','',''],
 'interests':['','','']},{}]

## Objectives
You will be able to:
* Explore unknown JSON schemas and transform it to another data structure
* Access and manipulate data inside a JSON file
* Pull data from an API and parse/transform the data

## Agenda

* Review JSON Schemas
* Practice transforming JSONS.
* Introduce APIs.
* Walk through how to make an API request. 
* Practice making API requests and Parsing the data.


### What is the difference between a JSON and a python dictionary?

## Loading the JSON file

As before, we begin by importing the json package, opening a file with python's built in function, and then loading that data in.

In [1]:
import json

In [2]:
f = open('output.json')
data = json.load(f)

## Exploring JSON Schemas  

Recall that JSON files have a nested structure. The most granular level of raw data will be individual numbers (float/int) and strings. These in turn will be stored in the equivalent of python lists and dictionaries. Because these can be combined, we'll start exploring by checking the type of our root object, and start mapping out the hierarchy of the json file.

In [3]:
type(data)

dict

In [4]:
data

{'albums': {'href': 'https://api.spotify.com/v1/browse/new-releases?country=SE&offset=0&limit=20',
  'items': [{'album_type': 'single',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2RdwBSPQiwcmiDo9kixcl8'},
      'href': 'https://api.spotify.com/v1/artists/2RdwBSPQiwcmiDo9kixcl8',
      'id': '2RdwBSPQiwcmiDo9kixcl8',
      'name': 'Pharrell Williams',
      'type': 'artist',
      'uri': 'spotify:artist:2RdwBSPQiwcmiDo9kixcl8'}],
    'available_markets': ['AD',
     'AR',
     'AT',
     'AU',
     'BE',
     'BG',
     'BO',
     'BR',
     'CA',
     'CH',
     'CL',
     'CO',
     'CR',
     'CY',
     'CZ',
     'DE',
     'DK',
     'DO',
     'EC',
     'EE',
     'ES',
     'FI',
     'FR',
     'GB',
     'GR',
     'GT',
     'HK',
     'HN',
     'HU',
     'ID',
     'IE',
     'IS',
     'IT',
     'JP',
     'LI',
     'LT',
     'LU',
     'LV',
     'MC',
     'MT',
     'MX',
     'MY',
     'NI',
     'NL',
     'NO',
     'NZ',
    

As you can see, in this case, the first level of the hierarchy is a dictionary. Let's explore what keys are within this:

In [5]:
data.keys()

dict_keys(['albums'])

In this case, there is only a single key, 'albums', so we'll continue on down the pathway exploring and mapping out the hierarchy. Once again, let's start by checking the type of this nested data structure.

In [4]:
type(data['albums'])

dict

Another dictionary! So thus far, we have a dictionary within a dictionary. Once again, let's investigate what's within this dictionary (JSON calls the equivalent of Python dictionaries Objects.)

In [5]:
data['albums'].keys()

dict_keys(['href', 'items', 'limit', 'next', 'offset', 'previous', 'total'])

At this point, things are starting to look something like this: 
<img src="json_diagram1.JPG" width=550>

At this point, if we were to continue checking individual data types, we have a lot to go through. To simplify this, let's use a for loop:

In [6]:
for key in data['albums'].keys():
    print(key, type(data['albums'][key]))

href <class 'str'>
items <class 'list'>
limit <class 'int'>
next <class 'str'>
offset <class 'int'>
previous <class 'NoneType'>
total <class 'int'>


Adding this to our diagram we now have something like this:
<img src="json_diagram2.JPG" width=550>

Normally, you may not draw out the full diagram as done here, but its a useful picture to have in mind, and in complex schemas, can be useful to map out. At this point, you also probably have a good idea of the general structure of the json file. However, there is still the list of items, which we could investigate further:

In [7]:
data

{'albums': {'href': 'https://api.spotify.com/v1/browse/new-releases?country=SE&offset=0&limit=20',
  'items': [{'album_type': 'single',
    'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/2RdwBSPQiwcmiDo9kixcl8'},
      'href': 'https://api.spotify.com/v1/artists/2RdwBSPQiwcmiDo9kixcl8',
      'id': '2RdwBSPQiwcmiDo9kixcl8',
      'name': 'Pharrell Williams',
      'type': 'artist',
      'uri': 'spotify:artist:2RdwBSPQiwcmiDo9kixcl8'}],
    'available_markets': ['AD',
     'AR',
     'AT',
     'AU',
     'BE',
     'BG',
     'BO',
     'BR',
     'CA',
     'CH',
     'CL',
     'CO',
     'CR',
     'CY',
     'CZ',
     'DE',
     'DK',
     'DO',
     'EC',
     'EE',
     'ES',
     'FI',
     'FR',
     'GB',
     'GR',
     'GT',
     'HK',
     'HN',
     'HU',
     'ID',
     'IE',
     'IS',
     'IT',
     'JP',
     'LI',
     'LT',
     'LU',
     'LV',
     'MC',
     'MT',
     'MX',
     'MY',
     'NI',
     'NL',
     'NO',
     'NZ',
    

In [8]:
type(data['albums']['items'])

list

In [10]:
len(data['albums']['items'])

2

In [9]:
data['albums']['items'][1]

{'album_type': 'single',
 'artists': [{'external_urls': {'spotify': 'https://open.spotify.com/artist/3TVXtAsR1Inumwj472S9r4'},
   'href': 'https://api.spotify.com/v1/artists/3TVXtAsR1Inumwj472S9r4',
   'id': '3TVXtAsR1Inumwj472S9r4',
   'name': 'Drake',
   'type': 'artist',
   'uri': 'spotify:artist:3TVXtAsR1Inumwj472S9r4'}],
 'available_markets': ['AD',
  'AR',
  'AT',
  'AU',
  'BE',
  'BG',
  'BO',
  'BR',
  'CH',
  'CL',
  'CO',
  'CR',
  'CY',
  'CZ',
  'DE',
  'DK',
  'DO',
  'EC',
  'EE',
  'ES',
  'FI',
  'FR',
  'GB',
  'GR',
  'GT',
  'HK',
  'HN',
  'HU',
  'ID',
  'IE',
  'IS',
  'IT',
  'JP',
  'LI',
  'LT',
  'LU',
  'LV',
  'MC',
  'MT',
  'MY',
  'NI',
  'NL',
  'NO',
  'NZ',
  'PA',
  'PE',
  'PH',
  'PL',
  'PT',
  'PY',
  'SE',
  'SG',
  'SK',
  'SV',
  'TR',
  'TW',
  'UY'],
 'external_urls': {'spotify': 'https://open.spotify.com/album/0geTzdk2InlqIoB16fW9Nd'},
 'href': 'https://api.spotify.com/v1/albums/0geTzdk2InlqIoB16fW9Nd',
 'id': '0geTzdk2InlqIoB16fW9Nd',
 'im

In [12]:
data['albums']['items'][0].keys()

dict_keys(['album_type', 'artists', 'available_markets', 'external_urls', 'href', 'id', 'images', 'name', 'type', 'uri'])

## Converting JSON to Alternative Data Formats
As you can see, the nested structure continues on: our list of items is only 2 long, but each item is a dictionary with a large number of key value pairs. To add context, this is actually the data that we're probably after from this file: its that data providing details about what albums were recently released. The entirety of the JSON file itself is an example response from the Spotify API (more on that soon). So while the larger JSON provides us with many details about the response itself, our primary interest may simply be the list of dictionaries within data -> albums -> items. Let's preview this and see if we can transform it into our usual Pandas DataFrame.

## Create a new data structure that only contains relevant information

In [None]:
#insert code here
[{artist:str,
 artist_id: str,
 features:[{artist_name: str, artist_id:str}],
 num_markets : int,
 album_id : str,
 album_type: str,
 album_name: str,
}]

## Create a script to that now will parse the JSON into the new structure

In [7]:
#insert code here
#for my final data structure, I want a list of dictionaries for each album
#So i start by creating a blank list
albums = []

#now I'm going to loop through each album and create a dictionary for the info of each album
for album in data['albums']['items']:
    #Here we create a blank dictionary to add all of the data we want.
    parsed_album={}
    # I make an assumption that the first artist is the primary artist and all following ones are featuring.
    # So I use the index to grab the first artist and add it to my dictionary
    parsed_album['artist']= album['artists'][0]['name']
    parsed_album['artist_id'] = album['artists'][0]['id']
    parsed_album['features'] = []
    #here I am looping through all reamaining artists in the list and adding them to the 'features' list
    for feature in album['artists'][1:]:
        #I create a blank dictionary to add the information for each feature
        parsed_feature ={}
        parsed_feature['artist_name'] = feature['name']
        parsed_feature['artist_id'] = feature['id']
        #now I append that dictionary to the 'features' list
        parsed_album['features'].append(parsed_feature)
    parsed_album['num_markets']= len(album['available_markets'])
    parsed_album['album_id'] = album['id']
    parsed_album['album_type'] = album['type']
    parsed_album['album_name'] = album['name']
    #now that we have put in all of this data, lets append the dictionary to our albmus list.
    albums.append(parsed_album)


In [8]:
albums
#here we can look at our albums
#you will notice the features list is empty because the albums we parsed only had on artists in the list.

[{'artist': 'Pharrell Williams',
  'artist_id': '2RdwBSPQiwcmiDo9kixcl8',
  'features': [],
  'num_markets': 60,
  'album_id': '5ZX4m5aVSmWQ5iHAPQpT71',
  'album_type': 'album',
  'album_name': "Runnin'"},
 {'artist': 'Drake',
  'artist_id': '3TVXtAsR1Inumwj472S9r4',
  'features': [],
  'num_markets': 57,
  'album_id': '0geTzdk2InlqIoB16fW9Nd',
  'album_type': 'album',
  'album_name': 'Sneakin’'}]

## Summary

JSON files often have a deep nested structure that can require initial investigation into the schema hierarchy in order to become familiar with how data is stored. Once done, it is important to identify what data your are looking to extract and then develop a strategy to transform it into your standard workflow (which generally will be dependent on Pandas DataFrames or NumPy arrays).

## What is an API?

Dealing with HTTP requests could be a challenging task  any programming language. Python with two built-in modules, `urllib` and `urllib2` to handle these requests but these could be very confusing  and the documentation is not clear. This requires the programmer to write a lot of code to make even a simple HTTP request.

To make these things simpler, one easy-to-use third-party library, known as` Requests`, is available and most developers prefer to use it instead or urllib/urllib2. It is an Apache2 licensed HTTP library powered by urllib3 and httplib. Requests is add-on library that allows you to send HTTP requests using Python. With this library, you can access content like web page headers, form data, files, and parameters via simple Python commands. It also allows you to access the response data in a simple way.

![](logo.png)

Below is how you would install and import the requests library before making any requests. 
```python
# Uncomment and install requests if you dont have it already
# !pip install requests

# Import requests to working environment
import requests
```

In [10]:
# Code here 
import requests



## The `.get()` Method

Now we have requests library ready in our working environment, we can start making some requests using the `.get()` method as shown below:
```python
### Making a request
resp = requests.get('https://www.google.com')
```

In [12]:
# Code here
resp = requests.get('https://www.google.com')

In [13]:
resp

<Response [200]>


GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. 
GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. 

## Status Codes
The request we make may not be always successful. The best way is to check the status code which gets returned with the response. Here is how you would do this. 
```python
# Check the returned status code
resp.status_code == requests.codes.ok
```

In [16]:
# Code here 
resp.status_code == requests.codes.ok

True

So this is a good check to see if our request was successful. Depending on the status of the web server, the access rights of the clients and availibility of requested information. A web server may return a number of status codes within the response. Wikipedia has an exhaustive details on all these codes. [Check them out here](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

## Response Contents
Once we know that our request was successful and we have a valid response, we can check the returned information using `.text` property of the response object. 
```python
print (resp.text)
```

In [17]:
# Code here 
print(resp.text)

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="V8gLsLyvsUg2eKCz8qCJDA==">(function(){window.google={kEI:'CmFcXM3dLITu_QbfzIZA',kEXPI:'0,1353746,58,6,1952,1016,1406,697,528,590,140,326,1123,116,234,30,695,253,279,806,19,530,528,4,103,1,130,2334963,329536,1294,12383,4855,32692,15247,867,10761,1402,5281,1100,853,2482,2,2,4609,2192,364,3319,5505,224,2218,260,3742,1089,2,14,260,575,835,284,2,1306,2432,58,2,1,3,1297,4323,3390,8,302,1267,774,2247,1410,3344,1139,5,2,2,1965,2489,104,465,556,2581,668,1050,1808,1129,268,81,7,1,2,25,447,1

So this returns a lot of information which by default is not really human understandable due to data encoding, HTML tags and other styling information that only a web browser can truly translate. In later lessons we shall look at how we can use ** Regular Exprerssions**  to clean this information and extract the required bits and pieces for analysis. 

## Response Headers
The response of an HTTP request can contain many headers that holds different bits of information. We can use `.header` property of the response object to access the header information as shown below:

```python
# Read the header of the response - convert to dictionary for displaying k:v pairs neatly
dict(resp.headers)
```

In [18]:
# Code here 
dict(resp.headers)

{'Date': 'Thu, 07 Feb 2019 16:47:06 GMT',
 'Expires': '-1',
 'Cache-Control': 'private, max-age=0',
 'Content-Type': 'text/html; charset=ISO-8859-1',
 'P3P': 'CP="This is not a P3P policy! See g.co/p3phelp for more info."',
 'Content-Encoding': 'gzip',
 'Server': 'gws',
 'X-XSS-Protection': '1; mode=block',
 'X-Frame-Options': 'SAMEORIGIN',
 'Set-Cookie': '1P_JAR=2019-02-07-16; expires=Sat, 09-Mar-2019 16:47:06 GMT; path=/; domain=.google.com, NID=158=CJkwSlk-yb2Ham3a1CeqcFcFa9pfSmChdp4aBi9lePzeUhE7e99qQ0Spwt3ghSOJ-NOkTJU-LWIkVGHgWWCw2AJG8bFAtlEx_OYEGxCLQA-lsi2Da4jhR1VX6TcrMnO09JKsxn0O-VzvmOUmoTRInqL4jZlJEzh1f858fiJoJhI; expires=Fri, 09-Aug-2019 16:47:06 GMT; path=/; domain=.google.com; HttpOnly',
 'Alt-Svc': 'quic=":443"; ma=2592000; v="44,43,39"',
 'Transfer-Encoding': 'chunked'}

The content of the headers is our required element. You can see the key-value pairs holding various pieces of  information about the resource and request. Let's try to parse some of these values using the requests library:

```python
print(resp.headers['Content-Length'])  # length of the response
print(resp.headers['Date'])  # Date the response was sent
print(resp.headers['server'])   # Server type (google web service - GWS)
```

In [13]:
# Code here
resp.data

## Try `httpbin`
`httpbin.org` is a popular website to test different HTTP operation and practice with request-response cycles. Let's use httpbin/get to analyse the response to a GET request. First of all, we shall find out the response header and inspect how it looks. 

```python
r = requests.get('http://httpbin.org/get')

response = r.json()  
print(r.json())  
print(response['args'])  
print(response['headers'])  
print(response['headers']['Accept'])  
print(response['headers']['Accept-Encoding'])  
print(response['headers']['Connection'])  
print(response['headers']['Host'])  
print(response['headers']['User-Agent'])  
print(response['origin'])  
print(response['url'])  
```

In [19]:
# Code here
r = requests.get('http://httpbin.org/get')

response = r.json()  
print(r.json())  
print(response['args'])  
print(response['headers'])  
print(response['headers']['Accept'])  
print(response['headers']['Accept-Encoding'])  
print(response['headers']['Connection'])  
print(response['headers']['Host'])  
print(response['headers']['User-Agent'])  
print(response['origin'])  
print(response['url']) 

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.21.0'}, 'origin': '96.232.187.141', 'url': 'http://httpbin.org/get'}
{}
{'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.21.0'}
*/*
gzip, deflate
close
httpbin.org
python-requests/2.21.0
96.232.187.141
http://httpbin.org/get



Let's use `requests` object structure to parse the values of headers as we did above. 

```python
print(r.headers['Access-Control-Allow-Credentials'])  
print(r.headers['Access-Control-Allow-Origin'])  
print(r.headers['CONNECTION'])  
print(r.headers['content-length'])  
print(r.headers['Content-Type'])  
print(r.headers['Date'])  
print(r.headers['server'])  
print(r.headers['via'])  
```

In [16]:
# Code here 

## Passing Parameters in GET
In some cases, you'll need to pass parameters along with your GET requests. These extra parameters usually take the the form of query strings added to the requested URL. To do this, we need to pass these values in the `params` parameter. Let's try to acces information from `httpbin` with some user information. 

Note: The user information is not getting authenticated at `httpbin` so any name/password will work fine. This is merely for practice. 

```python
credentials = {'user_name': 'doe', 'password': 'jane'}  
r = requests.get('http://httpbin.org/get', params=credentials)

print(r.url)  
print(r.text)  
```

In [11]:
# Code here 


## HTTP POST method 

Sometimes we need to send one or more files simultaneously to the server. For example, if a user is submitting a form and the form includes different fields for uploading files, like user profile picture, user resume, etc. Requests can handle multiple files on a single request. This can be achieved by putting the files to a list of tuples in the form (`field_name, file_info)`.


```python
import requests

url = 'http://httpbin.org/post'  
file_list = [  
    ('image', ('fi.png', open('fi.png', 'rb'), 'image/png')),
    ('image', ('fi2.jpeg', open('fi2.jpeg', 'rb'), 'image/png'))
]

r = requests.post(url, files=file_list)  
print(r.text)  
```

In [10]:
# Code here

So this is a brief introduction to how you would send requests and get responses from a web server, while totally avoiding the web browser interface. Later we shall see how we can pick up the required data elements from the contents of the web page for analytical purpose.

# Using the Yelp API - Codealong

The previously deployed Codealong around working with the twitter API can be found [here](https://github.com/learn-co-curriculum/dsc-2-15-08-Twitter-API-tokens-codealong/tree/d297ab4dee806203cb861ce4f39b301c5189990f) (not relevant for new students).

## Introduction

Now that we've discussed HTTP requests and OAuth, it's time to practice applying those skills to a production level API. In this codealong, we'll take you through the process of signing up for an OAuth token and then using that to make requests to the Yelp API!

## Objectives

You will be able to:

* Generate an OAuth token for the yelp API
* Make requests using OAuth


## Generating Access Tokens

As discussed, in order to use many APIs, one needs to use OAuth which requires an access token. As such, our first step will be to generate this login information so that we can start making some requests.  

With that, lets go grab an access token from an API site and make some API calls!
Point your browser over to this [yelp page](https://www.yelp.com/developers/v3/manage_app) and start creating an app in order to obtain and api access token:


![](./images/yelp_app.png)

You can either sign in to an existing Yelp account, or create a new one, if needed.

On the page you see above, simply fill out some sample information such as "Flatiron Edu API Example" for the app name, or whatever floats your boat. Afterwards, you should be presented with an API key that you can use to make requests!

With that, it's time to start making some api calls!

In [20]:
#As a general rule of thumb, don't store passwords in a main file like this!
#Instead, you would normally store those passwords under a sub file like passwords.py which you would then import.
#Or even better, as an environment variable that could then be imported!
#For now, we'll simply hardcode them into our notebook for simplicity.
client_id = 'bVX1Jsfp4dkIOqw5HOVplg' #Your client ID goes here (as a string)
api_key = 'RTzp-q-TgkJW_NFQogubFvZNRDziXyoR38VbtZMWibDI-FlvB25OE7GmafFEqhTL8_Bk2HlcX24-hRWLMP7Nc6WHO_VXMXldpPBjP0LoPv5EFFELMSI2oll8njhbXHYx' #Your api key goes here (as a string)

## An Example Request with OAuth <a id="oauth_request"></a>
https://www.yelp.com/developers/documentation/v3/get_started

In the next lesson, we'll further dissect how to read and translate online documentation like the link here. For now, let's simply look at an example request and dissect it into its consituent parts:

In [21]:
term = 'Mexican'
location = 'Astoria NY'
SEARCH_LIMIT = 10

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(api_key),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)
print(response)
print(type(response.text))
print(response.text[:1000])

<Response [200]>
<class 'str'>
{"businesses": [{"id": "AUyKmFjpaVLwc3awfUnqgQ", "alias": "chela-and-garnacha-astoria", "name": "Chela & Garnacha", "image_url": "https://s3-media1.fl.yelpcdn.com/bphoto/ChVbA1_xqLHFXL4Iyh84NA/o.jpg", "is_closed": false, "url": "https://www.yelp.com/biz/chela-and-garnacha-astoria?adjust_creative=bVX1Jsfp4dkIOqw5HOVplg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=bVX1Jsfp4dkIOqw5HOVplg", "review_count": 325, "categories": [{"alias": "mexican", "title": "Mexican"}, {"alias": "wine_bars", "title": "Wine Bars"}, {"alias": "breakfast_brunch", "title": "Breakfast & Brunch"}], "rating": 4.5, "coordinates": {"latitude": 40.7557171543477, "longitude": -73.927811292412}, "transactions": ["delivery", "pickup"], "price": "$$", "location": {"address1": "33-09 36th Ave", "address2": "", "address3": "", "city": "Astoria", "zip_code": "11106", "country": "US", "state": "NY", "display_address": ["33-09 36th Ave", "Astoria, NY 11106"]}, "phone": "+

## Breaking Down the Request

As you can see, there are three main parts to our request.  
  
They are:
* The url
* The header
* The parameters
  
The url is fairly straightforward and is simply the base url as described in the documentation (again more details in the upcoming lesson).

The header is a dictionary of key-value pairs. In this case, we are using a fairly standard header used by many APIs. It has a strict form where 'Authorization' is the key and 'Bearer YourApiKey' is the value.

The parameters are the filters which we wish to pass into the query. These will be embedded into the url when the request is made to the api. Similar to the header, they form key-value pairs. Valid key parameters by which to structure your queries, are described in the API documentation which we'll look at further shortly. A final important note however, is the need to replace spaces with "+". This is standard to many requests as URLs cannot contain spaces. (Note that the header itself isn't directly embedded into the url itself and as such, the space between 'Bearer' and YourApiKey is valid.)


## The Response

As before, our response object has both a status code, as well as the data itself. With that, let's start with a little data exploration!

In [22]:
response.json().keys()


dict_keys(['businesses', 'total', 'region'])

Now let's go a bit further and start to preview what's stored in each of the values for these keys.


In [23]:
for key in response.json().keys():
    print(key)
    value = response.json()[key] #Use standard dictionary formatting
    print(type(value)) #What type is it?
    print('\n\n') #Seperate out data

businesses
<class 'list'>



total
<class 'int'>



region
<class 'dict'>





Let's continue to preview these further to get a little better acquainted.


In [26]:
yelp_data =response.json()
yelp_data['businesses'][:2]



[{'id': 'AUyKmFjpaVLwc3awfUnqgQ',
  'alias': 'chela-and-garnacha-astoria',
  'name': 'Chela & Garnacha',
  'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/ChVbA1_xqLHFXL4Iyh84NA/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/chela-and-garnacha-astoria?adjust_creative=bVX1Jsfp4dkIOqw5HOVplg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=bVX1Jsfp4dkIOqw5HOVplg',
  'review_count': 325,
  'categories': [{'alias': 'mexican', 'title': 'Mexican'},
   {'alias': 'wine_bars', 'title': 'Wine Bars'},
   {'alias': 'breakfast_brunch', 'title': 'Breakfast & Brunch'}],
  'rating': 4.5,
  'coordinates': {'latitude': 40.7557171543477, 'longitude': -73.927811292412},
  'transactions': ['delivery', 'pickup'],
  'price': '$$',
  'location': {'address1': '33-09 36th Ave',
   'address2': '',
   'address3': '',
   'city': 'Astoria',
   'zip_code': '11106',
   'country': 'US',
   'state': 'NY',
   'display_address': ['33-09 36th Ave', 'Astoria, NY 11106']},
  'pho

As you can see, we're primarily interested in the 'bussinesses' entry. 


## Summary <a id="sum"></a>

Congratulations! We've covered a lot here! We took some of your previous knowledge with HTTP requests and OAuth in order to leverage an enterprise API! Then we made some requests to retrieve information that came back as a json format. We then transformed this data into a dataframe using the Pandas package. In the next lab, we'll break down how to read API documentation and then put it all together to make a nifty map!

## Problem Introduction

You've now worked with some API calls, but we have yet to see how to retrieve a more complete dataset in a programmatic manner. Returning to the Yelp API, the [documentation](https://www.yelp.com/developers/documentation/v3/business_search) also provides us details regarding the API limits. These often include details about the number of requests a user is allowed to make within a specified time limit and the maximum number of results to be returned. In this case, we are told that any request has a maximum of 50 results per request and defaults to 20. Furthermore, any search will be limited to a total of 1000 results. To retrieve all 1000 of these results, we would have to page through the results piece by piece, retriving 50 at a time. Processes such as these are often refered to as pagination.

In this lab, you will define a search and then paginate over the results to retrieve all of the results. You'll then parse these responses as a DataFrame (for further exploration) and create a map using Folium to visualize the results geographically.

## Part I - Make the Initial Request

Start by making an initial request to the Yelp API. Your search must include at least 2 parameters: **term** and **location**. For example, you might search for pizza restaurants in NYC. The term and location is up to you, but make the request below.

In [24]:
#your code here

## Pagination

Now that you have an initial response, you can examine the contents of the json container. For example, you might start with ```response.josn().keys()```. Here, you'll see a key for `'total'`, which tells you the full number of matching results given your query parameters. Write a loop (or ideally a function) which then makes successive API calls using the offset parameter to retrieve all of the results (or 5000 for a particularly large result set) for the original query. As you do this, be mindful of how you store the data. Your final goal will be to reformat the data concerning the businesses themselves into a pandas DataFrame from the json objects.

**Note: be mindful of the API rate limits. You can only make 5000 requests per day, and are also can make requests too fast. Start prototyping small before running a loop that could be faulty. You can also use time.sleep(n) to add delays. For more details see https://www.yelp.com/developers/documentation/v3/rate_limiting.**

In [25]:
#your code here

## Exploratory Analysis

Take the restaurants from the previous question and do an intial exploratory analysis. At minimum, this should include looking at the distribution of features such as price, rating and number of reviews as well as the relations between these dimensions.

In [26]:
#your code here