**PySDS Week 3 Lecture 1. V.1**
Last author: B. Hogan

# Week 3. Day 4. - API Access practice 

Learning goals: 
- Get vs Post requests 
- Authenticating OAuth 
- Paging through a query


# Get vs. Post requests 

Recall that previously we used a get request in order to send a url string to a server. Everything after the domain name was used to find the right file and then present some important details to the server, such as those we found after the argument string. When we type in a URL in a browser we are similarly sending a GET request. 

A POST request similarly sends up a URL to a server. It similarly has a series of headers including a user-agent string. However, POST requests also contain a 'payload', which is a dictionary of key value pairs. The values are data for the server and the keys are what kind of data. 

POST requests are more secure than GET requests. For example, a POST request should happen every time you click submit after entering some credentials. By sending it through POST, the client can encrypt the data in the payload. Otherwise, you would be able to see the URL with your username and password as arguments in the URL string. Worse, if this is HTTP and not HTTPS, then the URL string is not encrypted in transit. This means that every server log, from the university's server logs to anyone who happened to be sniffing traffic on the wifi will be able to see your username and password. POST avoids that by putting these things in a payload. 

Example from TheTVDB. The site, The TVDB is an independent site of user-generated content \[UGC]. It's donation-based and has an API for access. It is not associated with IMDB. The site's data is licensed under Creative Commons 3.0. The nice thing about the site is that it is pretty austere and it has a clear API (which is pretty new judging by the forums, and it shows with respect to its usability).  

Now when we log into the site, just like what I noted above, you have to fill out some details, namely your email and your password. Below is a snippet of the HTML code for that process. To see this yourself you can go to: https://www.thetvdb.com/login and then right-click -> "show page source". The page source is pretty long, but this snippet is in the middle. 

~~~ html
<form method="post" action="https://www.thetvdb.com/login/authenticate/concrete">

	<div class="form-group">
		<label class="control-label">Email Address</label>
		<input name="uName" class="form-control" autofocus="autofocus" />
	</div>

	<div class="form-group">
		<label class="control-label">Password</label>
		<input name="uPassword" class="form-control" type="password" />
	</div>

	<div class="checkbox">
		<label>
			<input type="checkbox" name="uMaintainLogin" value="1">
			Stay signed in for two weeks		</label>
	</div>

	
	<div class="form-group">
		<button class="btn btn-primary">Log in</button>
		<a href="https://www.thetvdb.com/login/concrete/forgot_password" class="btn pull-right">Forgot Password</a>
	</div>

	<input type="hidden" name="ccm_token" value="1540511475:c69b8e1d766dc55e7576525a29355643" />
			<br/>
		<hr/>
		<a href="https://www.thetvdb.com/register" class="btn btn-block btn-success">Not a member? Register</a>
	
</form>
~~~    

The snippet shows that in order to log in, you have to click a button. Then it will send a post request to  https://www.thetvdb.com/login/authenticate/concrete with the values of the forms. It also will use the value from the ccm_token in order to prevent cross site forgeries. So, you see, post happens all the time. 

We are going to have to create a post request if we want to get an API key from TheTVDB. 

In [None]:
print(len("<API-KEY>"))

When we go to TheTVDB's api page we are told that we need a token. They have a very handy on-site tester where you can fill in credentials and then submit. We will first create a token through this API. 

Notice that it produces the following request: 

~~~ bash
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{
  "apikey": "",
  "userkey": "",
  "username": "bernie.hogan4a5"
}' 'https://api.thetvdb.com/login'
~~~

This is a 'curl' request. Curl is a common tool for downloading data from the web. It has a lot of arguments and parameters. If you ran this from a terminal window it would return the response right in the window with some tweaking. We however, are just going to use it to learn a few things, then create our own request using the 'requests' library in python. 

In [None]:
import requests

payload = {
    "apikey": "",
    "userkey": "",
    "username": "bernie.hogan4a5"}

headers = {"Content-Type":"application/json"}

r = requests.post("https://api.thetvdb.com/login", json=payload)
print(r)

In [None]:
r.json()

In [None]:
token = r.json()['token']

Now that we have our token we can use a series of get requests to collect data from the API. The token can be an argument in our argument string. Now notice that this time around (unlike with the Wikipedia example) we will not be creating the argument string by hand. We will be able to put that together more programmatically with ```requests```. But first we need to know what to ask for. 

No surprise, let's download data for The Muppet Show. Now this data should be familiar as it is the very first data that you worked with on day one. In fact, much of what we have done is meant to come back full circle now. In week one we used a database of the first four seasons of the Muppets, but notably there are five seasons. The data for the fifth one would not have come through the first API query. Instead we have to page through the results. Today we will page through those results and add the data to a ```DataFrame```. 

But first...how do we get this? Let's go over to the API tester and see what's available. 

We can see that the API says **'Series : Information about a specific series'**. Look's good; let's show that one. Underneath are a series of API end points, such as 
```get /series/{id}/episodes/summary```. These are URLs that, along with some arguments in the argument string will return some data to an authenticated client. Well, they are part of the URL. ACtually, they are the part that comes after ```http://api.thetvdb.com/```

But how to:
- Authorize ourselves on that page? (see demo - copy and paste token into browser)
- Get the series ID? (see demo - using the search end point, once we are authorized will get us the series ID as a number in the json response). Hint: it is 72476


In [None]:
import requests

series_id = 72476
headers = {"Authorization":"Bearer %s" % token}
r = requests.get("http://api.thetvdb.com/series/%s/episodes" % series_id, headers=headers)
r

In [None]:
if r.status_code == 200:
    response_data = r.json()

In [None]:
from pandas.io.json import json_normalize 

muppetTable = json_normalize(response_data["data"])
display(muppetTable)

In [None]:
display(muppetTable.tail())

Notice that the table included five episodes from season 5. But these episodes were not included in your earlier data, and surely they aren't the only episodes from season 5? Nope, in fact, in the json we have a paging form up top. Observe:

In [None]:
response_data.keys()

In [None]:
response_data["links"]

In [None]:
series_id = 72476
headers = {"Authorization":"Bearer %s" % token}
#           "page":str(response_data["links"]["next"])}
print(headers)
r = requests.get("http://api.thetvdb.com/series/%s/episodes?page=2" % series_id, headers=headers)
r

In [None]:
if r.status_code == 200:
    response_data = r.json()
    print("Received data")

    muppetTable2 = json_normalize(response_data["data"])
    display(muppetTable2)

In [None]:
import pandas as pd 

total = pd.concat([muppetTable,muppetTable2])
display(total)

In [None]:
len(total[total["airedSeason"] != 0])

Now we can put it all together in a single workflow. 

In [None]:
def getToken(apikey,userkey,username):
    
    import requests

    payload = {
        "apikey": apikey,
        "userkey": userkey,
        "username": username}
    
    headers = {"Content-Type":"application/json"}

    r = requests.post("https://api.thetvdb.com/login", json=payload)
    if r.status_code == 200:
        return r.json()["token"]
    else:
        print("Error: Status Code %s" % r.status_code)
        return None

def getEpisodeList(series_id,token):
    import pandas as pd
    
    episode_list = []
    headers = {"Authorization":"Bearer %s" % token}
    
    page = 1
    
    while True:
        url = "http://api.thetvdb.com/series/%s/episodes?page=%s" % (series_id,page)
        r = requests.get( url, headers=headers)
        if r.status_code == 200:
            response_data = r.json()
            episode_list.append(json_normalize(response_data["data"]))
            if response_data['links']["next"]:
                page = response_data['links']["next"]
            else:
                break
                
        else:
            print("Error: Status Code %s" % r.status_code)
            return None

    return pd.concat(episode_list)
        
        
token = getToken("",
                 "",
                 "bernie.hogan4a5")

df = getEpisodeList(72476,token)
print(len(df))
df.tail()

In [None]:
print(len(df))