In [None]:
import warnings
warnings.filterwarnings('ignore')

## Exercise 4 
### APIs 

### Introduction

An API (Application Programming interface) is a way for two different applications to communicate. Whilst the term applies to any two programs we are using it to refer to the API of a web service that provides data.

To retrieve data from an API, a request to a remote web server is made.

For example, if you want to build an application which plots stock prices, you would use the API of something like google finance to request the current stock prices.

APIs are useful where:
* Data is changing quickly, e.g. stock prices
* The whole dataset is not required, e.g. the tweets of one user
* Repeated computation is involved, e.g. Spotify API that tells you the genre of a piece of music

#### REST

Most API's you come across will be RESTful, i.e. they provide a REST (REpresentational State Transfer) interface.

REST uses standard HTTP commands which means that getting data from an API is similar to accessing a webpage. 

For example, When you type `www.duckduckgo.com` in your browser, your browser is asking the `www.duckduckgo.com` server for a webpage by making a `GET` HTTP (Hypertext Transfer Protocol) request. Making a `GET` request to a RESTful API instead retrieves data (rather than a webpage).

Similarly, while your browser uses `POST` to submit the contents of a form, REST APIs use `POST` to update data.

REST APIs also uses other HTTP commmands such as `PUT` - for creating data - and `DELETE` - for removing data.

HTTP is a text-based protocol (the response is always text) and could return a response in any format - this is typically found in the API documentation - though data is more often than not returned in JSON format.

As they are used to retrieve data `GET` requests are the most commonly used type of request, therefore we will restrict ourselves to `GET` in this tutorial.

#### JSON

JSON (JavaScript Object Notation) is a format for sending data, that is meant to be human readable and easy to parse (It was derived from JavaScript but is language-independent).

It uses attribute-value pairs (e.g. python dictionaries `{"name": "Pizza", "foodRanking": 1}`) and array data-types (e.g. python lists `[1, 2, 3]`)

Example JSON representation :
```
{
  "firstName": "Donald",
  "lastName": "Trump",
  "age": 73,
  "isAlive": true,
  "color": "orange",
  "addresses": [
      {
          "streetAddress": "1600 Pennsylvania Avenue NW",
          "city": "Washington, D.C.",
          "state": "null",
          "postalCode": "20500",
          "country": "US"
      },
      {
          "streetAddress": "721 Fifth Avenue",
          "city": "NYC",
          "state": "NY",
          "postalCode": "10022",
          "country": "US"
      }
  ],
}
```


#### Status codes

So we've sent off some mystery `GET` request but how do we know the request was successful?

Servers issue numeric [status codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) in response to HTTP requests that indicate whether a request has been successfully completed.

Some common ones relating to `GET` requests are:
* `200` - Success
* `300` - The API is redirecting to a different endpoint
* `400` - Bad request
* `401` - Not authenticated
* `403` - Forbidden
* `404` - Not found
* `429` - Too many requests

______________

## Part 1:

### Retrieve Post Codes

Simple example to discuss how API's work


### Part 1: Working with a simple API 


#### 1 - Find the API documenation

 - https://postcodeapi.com.au/

```bash
curl http://v0.postcodeapi.com.au/suburbs/3066.json 
-H 'Accept: application/json; indent=4'
```

#### 2 - Convert curl to python

https://curl.trillworks.com/

In [None]:
import requests

url = "http://v0.postcodeapi.com.au/suburbs/2010.json"
headers = {
    'Accept': 'application/json',
}

response = requests.get(url, headers=headers)
response.status_code

In [None]:
response.json()

______________

## Part 2:

### Revisit Google Trends


In [None]:
import pandas as pd
# 1) make a list of all the markets
markets = ["nsw", "vic", "qld","wa"] 

# 2) make an empty list for the dataframes
dfs = []

# 3) loop through all the markets and append its dataframe to the list
for market in markets:
    try:
        csv_path = f"data/1_trends/officeworks/5-21_{market}.csv"
        dfs.append(pd.read_csv(csv_path, header=1, index_col="Day"))
    except:
        pass
# 4) combine the list of dataframes
df_combined = pd.concat(dfs, axis=1)


df_combined.plot(figsize=(12,4))

### 1) API URI

#### 1 - Retrieve the url and headers from the Browser


In [None]:
r = requests.get(url="https://trends.google.com/trends/api/widgetdata/multiline?hl=en-US&tz=-600&req=%7B%22time%22:%222020-07-12+2021-07-12%22,%22resolution%22:%22WEEK%22,%22locale%22:%22en-US%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%22country%22:%22AU%22%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22ENTITY%22,%22value%22:%22%2Fm%2F08h9qn%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAYO0QQfxLR-8OFPSfNFfruCCHFpFzQ4rV&tz=-600",
                headers={'Content-type':'application/json; charset=UTF-8'}
)
r.status_code

#### 2 - Parse the data


In [None]:
import json
pd.DataFrame(json.loads(r.content[5:])['default']['timelineData']).head()

### 2) API Wrapper

- Often application API's are difficult to work with, being optimised for web browsers
- Python has many wrapper libraries to support working directly with the data

#### 1 - Install the pytrends library

In [None]:
pip install pytrends

#### 2 - Follow the instructions to construct a request

In [None]:
from pytrends.request import TrendReq
pytrends = TrendReq(hl='en-AU', tz=600) # create a new pytrends object

In [None]:
kw_list = ["Officeworks"] # build a keyword list
pytrends.build_payload(kw_list,timeframe="2021-05-01 2021-05-31", geo="AU-VIC") #build a payload
df = pytrends.interest_over_time() # get the data
df.head()

#### 3 - Format the DataFrame to use original column names

In [None]:
df.index.name="Day"
df.rename(columns={"Officeworks":"Officeworks: (Victoria)"}, inplace=True)
df.drop("isPartial", axis=1, inplace=True)
df.head()

#### 4 - Exercise build the plots again using pytrends package

In [None]:
# First define constants

client_list = ["Officeworks", "Johnson & Johnson", "Lotterywest"]
market_dict = {
    "AU-NSW": "New South Wales",
    "AU-VIC": "Victoria",
    "AU-QLD": "Queensland",
}
FIG_SIZE = (12, 4)

In [None]:
# Second define utility functions
def format_df(df, keyword, market):
    df.index.name="Day"
    market_name = market_dict[market]
    new_col_name = f"{keyword}: ({market_name})"
    df.rename(columns={keyword: new_col_name}, inplace=True)
    df.drop("isPartial", axis=1, inplace=True)
    return df

def get_trends_df(client, market, start_date, end_date):
    kw_list = [client] 
    pytrends.build_payload(kw_list,timeframe=f"{start_date} {end_date}", geo=market)
    df = pytrends.interest_over_time()
    return format_df(df, client, market)
    

In [None]:
# Third define final script
def plot_trends(client_list, market_list, start_date, end_date):
    for client in client_list:
        dfs = []
        for market in market_list:
            print(f"Collecting trends for {client} in {market}")
            dfs.append(get_trends_df(client, market, start_date, end_date))
        pd.concat(dfs, axis=1).plot(figsize=FIG_SIZE)
    

    
    

In [None]:
# Run script
plot_trends(client_list, market_dict,"2021-05-01", "2021-05-31" )

______________

## Additional Resources

### Real-world considerations for APIs

Things do not always go so nicely, particularly when using API's at scale.

We'll quickly cover some other common considerations when using API's, and outline how they can be solved.

#### Retries

Sometimes you can do everything perfectly, and send off a request but something on the web-server (or elsewhere) can go wrong and give a bad status code.
We don't want to silently ignore these errors or let them crash our program by raising an exception.

The first port of call is to retry the request again.

---

A hacky way to do this would be (**don't do this**):

``` python
import time
import requests

def get(url):
    try:
        r = requests.get(url)
        r.raise_for_status()  # raise an error on a bad status
    except:
        time.sleep(1)  # sleep for a bit in case that helps
        return get(url)  # try again
```

---

A better and easier way to do this is to use the [tenacity](https://github.com/jd/tenacity) library:

``` python
import requests
from tenacity import (retry, stop_after_attempt, wait_fixed,
                      retry_if_exception_type)

@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.1),
      retry=retry_if_exception_type(requests.HTTPError))
def get(url):
    try:
        r = requests.get(url)
        r.raise_for_status()  # raise an error on a bad status
        return r
    except requests.HTTPError:
        print(r.status_code, r.reason)
        raise
```

It uses a python decorator (the `@` symbol) to wrap our function with another function `retry` that will retry if we raise an error.

We can tell it how many times to stop trying after, how long to wait between each retry, what error to retry on etc.

#### Authentication

Not all API's are open for immediate use. For example, some require you pay for access (e.g. the google maps API) and some require you to register for access first.

When you get access to a "closed" API, you will typically get an API key - a long string of letters and numbers - which is unique to you which you need to send along with any GET request you make to the API.
This lets the API know who you are and decide how to deal with your request.

Several different types of authentication exists (read the specific API docs) but the most common way is:
``` python
api_key = 'asodifhafglkkhj'
r = requests.get(url, auth=(api_key, ''))
```

#### Rate limits

API's can be costly to host and typically limit the number of requests that can be made (either by an IP or API key).
If you exceed this limit you'll get a `429` status code for any extra requests you make (and may be blocked if you continue making them).

It is important to therefore respect any rate limits given in an API's documentation (annoyingly some are very vague).
The simplest way to do this is to limit how often the number of times our function that makes the request can be called within some time limit using the [ratelim](https://pypi.org/project/ratelim/) library - again using decorators.

### Resources

In [None]:
## REDDIT - https://www.reddit.com/dev/api/#GET_subreddits_search
# Let's find some subreddits to learn python with!
# https://www.reddit.com/dev/api/#GET_subreddits_search

url = 'https://www.reddit.com/subreddits/search.json?q="learn python"&limit=5'
r = requests.get(url, headers={'User-agent': 'your bot 0.1'})
r.raise_for_status()

[result['data']['display_name_prefixed']
 for result in r.json()['data']['children']]

### List of API's

Massive list [**here**](https://github.com/public-apis/public-apis)


### API wrapper libraries

Massive list [**here**](https://github.com/realpython/list-of-python-api-wrappers).