# JSON and APIs

_August 11, 2020_

Agenda today:
- Introduction to API and Remote Server Model 
- Getting data through an API:
    - Github API
    - Yelp API
- Working with JSON files

In [2]:
import pandas as pd
import numpy as np
import requests
import json
from yelp.client import Client
import matplotlib.pyplot as plt
plt.style.use('seaborn')

ModuleNotFoundError: No module named 'yelp'

## Part I. APIs and Remote Server Model
API stands for Application Programming Interface. At some point or the other, large companies would build API for their products for their clients or internal use. It allows the company's application to communicate with another application. But what _exactly_ is an API?

#### Remote server 
When we think about the world of Web, we can think of it as a collection of _servers_. And servers are nothing but huge computers that store a huge amount of data from users and are optimized to process requests. For example, when you type in www.facebook.com, your browser sends a _request_ to the Facebook server, and gets a response from the server, thus interpreting the code and displaying your homepage. 

In this case, your browser is the _client_, and Facebook’s server is an API. To put it broadly, whenever you visit a website, you are interacting with its API. However, an API isn’t the same as the remote server — rather it is the part of the server that receives __requests__ and sends __responses__.

<img src='status-code.png' width = 500>

## Part II. Getting Data Through APIs

#### Github API
Github API is an example of API that does not need _authentication_. You can send `GET` requests to the API and receive information.  

The `get()` method send a request to Github's API, and stored information in a variable called `request`. Next, let's see if it's successful. 

In [3]:
request = requests.get('https://api.github.com')

In [4]:
## status code
request.status_code

200

In [11]:
# examine the body of the request
#type(request.text) we need to clean up this string by turning into a dictionary by telling python to read dictionary

request.text

'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","label_sear

In [10]:
# examine the content
request.content
#type(request.content)

b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","label_sea

In [12]:
# examine the headers
request.headers['server']

'GitHub.com'

In [14]:
# we can make use of the JSON library to efficiently load the data and manipulate it 
request_json = json.loads(request.text) #method to read it
#json.load json.dump 

In [15]:
request_json
#type(request_json)

{'current_user_url': 'https://api.github.com/user',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': '

In [16]:
# loop through the request_json file and examine the keys 

# how do you loop through the keys again?
for key in request_json.keys():
    print(key)

current_user_url
current_user_authorizations_html_url
authorizations_url
code_search_url
commit_search_url
emails_url
emojis_url
events_url
feeds_url
followers_url
following_url
gists_url
hub_url
issue_search_url
issues_url
keys_url
label_search_url
notifications_url
organization_url
organization_repositories_url
organization_teams_url
public_gists_url
rate_limit_url
repository_url
repository_search_url
current_user_repositories_url
starred_url
starred_gists_url
user_url
user_organizations_url
user_repositories_url
user_search_url


In [17]:
# can you think of a way in which you'd put them in a dataframe?
request_df = pd.DataFrame.from_dict(request_json, orient='index')
request_df.columns = ['url']
request_df.head()

Unnamed: 0,url
current_user_url,https://api.github.com/user
current_user_authorizations_html_url,https://github.com/settings/connections/applic...
authorizations_url,https://api.github.com/authorizations
code_search_url,https://api.github.com/search/code?q={query}{&...
commit_search_url,https://api.github.com/search/commits?q={query...


#### YELP API
Sometimes you need _authentication_ to get data from a service in additional to just sending a `GET()` request. Yelp API is a perfect example. 

You will need to go to the YELP's developer's [website](https://www.yelp.com/developers/v3/manage_app) and request for a client ID and API key, which function like a key into a house of data. 

<img src='yelp.png' width = 500>

In [18]:
# lets try to get some data from yelp!
url = 'https://api.yelp.com/v3/businesses/search'
response = requests.get(url) #NEED TO SPECIFY KEY

In [19]:
# check the status code
response.status_code

# what happened here?

400

In [20]:
#You have to use your API key to access the data!

MY_API_KEY = "https://api.yelp.com/v3/autocomplete?text=del&latitude=37.786882&longitude=-122.399972" # replace this with your API key!
client = Client(MY_API_KEY)

NameError: name 'Client' is not defined

In [23]:
# now we are ready to get our data 

# usually, services would limit you to a certain amount of API calls. This varies from service
# to service, so you have to watch out to it 

term = 'Axe Throwing'
location = 'Brooklyn'
SEARCH_LIMIT = 10

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
        'Authorization': 'Bearer {}'.format(MY_API_KEY),
    }

url_params = {
                'term': term.replace(' ', '+'),
                'location': location.replace(' ', '+'),
                'limit': SEARCH_LIMIT
            }
response = requests.get(url, headers=headers, params=url_params)
print(response)
print(type(response.text))
print(response.text)

<Response [400]>
<class 'str'>
{"error": {"code": "VALIDATION_ERROR", "description": "'Bearer https://api.yelp.com/v3/autocomplete?text=del&latitude=37.786882&longitude=-122.399972' does not match '^(?i)Bearer [A-Za-z0-9\\\\-\\\\_]{128}$'", "field": "Authorization", "instance": "Bearer https://api.yelp.com/v3/autocomplete?text=del&latitude=37.786882&longitude=-122.399972"}}


In [24]:
# cleaning and formatting the data
axe_throwing = response.text
axe_throwing = json.loads(axe_throwing)

In [25]:
# cleaning and exploring the data
for key in axe_throwing.keys():
    print(key)

error


In [26]:
# examine the first element of businesses
axe_throwing['businesses'][0].keys()

KeyError: 'businesses'

In [None]:
#change it into a datafram by importing pandas     

In [None]:
axe_throwing_df = pd.DataFrame.from_dict(axe_throwing['businesses'])
axe_throwing_df

In [None]:
# you can do some analysis and visualization from here on! 

plt.hist(axe_throwing_df['review_count'], color='pink', alpha = 0.8)
plt.title('Axe Throwing reviewing count in Brooklyn')
plt.xlabel('Reviews')
plt.ylabel('Count')

In [None]:
df.df.review_count>200

In [None]:
# query the name of the axe throwing place with the highest review
axe_throwing_df.sort_values(by = 'review_count', ascending = False).name.reset_index(drop = True)[0]


In [None]:
# can you do some other queries using sql/pandas?

#### Resources
- [Getting Data from Reddit API](https://www.storybench.org/how-to-scrape-reddit-with-python/)
- [Twitch API](https://dev.twitch.tv/docs)