# Accessing data from an API

This notebook has two simple excerises demonstrating how to extract data from an [Application Programming Interface](https://en.wikipedia.org/wiki/API). An API is a tool for computers or applications to interact with one another. In our case, we'll be asking for data, and the API will return it. These systems can be complicated, but most of those we might use in data journalism are relatively simple.

#### Import our data tools

In [1]:
%load_ext lab_black

In [2]:
import pandas as pd
import requests

In [3]:
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = None

---

## Cat facts!

[Read the documentation](https://alexwohlbruck.github.io/cat-facts/docs/)

#### Get random facts

In [4]:
cat_df = pd.read_json(
    "https://cat-fact.herokuapp.com/facts/random?animal_type=cat&amount=500"
)

#### First five rows

In [5]:
cat_df.head()

Unnamed: 0,status,_id,updatedAt,createdAt,user,text,deleted,source,__v,type,used,sendDate
0,"{'verified': True, 'sentCount': 1}",5a4aab252c99ee00219e11c3,2020-08-23T20:20:01.611Z,2018-01-28T21:20:03.068Z,5a9ac18c7478810ea6c06381,Cats can move their ears 180 degrees.,False,user,0.0,cat,0.0,
1,"{'verified': True, 'sentCount': 1}",5c3552058e0b8d00148d45e5,2020-08-23T20:20:01.611Z,2019-01-09T01:44:37.783Z,5a9ac18c7478810ea6c06381,"Despite its traditionally wild roots, the Bengal is domestic and will gladly make itself in the indoor ""jungle"" of your home.",False,user,0.0,cat,0.0,
2,"{'verified': None, 'sentCount': 0}",60da57a36adf1000178e871d,2021-06-28T23:13:39.449Z,2021-06-28T23:13:39.449Z,60d99d8c6adf1000178e6105,Gg.,False,,0.0,cat,,
3,"{'verified': True, 'sentCount': 1}",58e00ba00aac31001185edfa,2020-08-23T20:20:01.611Z,2018-02-22T21:20:03.364Z,58e007480aac31001185ecef,"When cats leave their poop uncovered, it is a sign of aggression to let you know they don't fear you.",False,user,0.0,cat,0.0,
4,"{'verified': None, 'sentCount': 0}",5fd56ccbb3fb8b001735716c,2020-12-13T01:22:19.334Z,2020-12-13T01:22:19.334Z,5fd56c8db3fb8b0017357163,Cats have nearly twice the amount of neurons in their cerebral cortex as dogs.,False,,0.0,cat,,


#### How many records? 

In [6]:
len(cat_df)

500

#### What's the first fact?

In [7]:
cat_df["text"][0]

'Cats can move their ears 180 degrees.'

In [8]:
cat_df["status"].head()

0    {'verified': True, 'sentCount': 1}
1    {'verified': True, 'sentCount': 1}
2    {'verified': None, 'sentCount': 0}
3    {'verified': True, 'sentCount': 1}
4    {'verified': None, 'sentCount': 0}
Name: status, dtype: object

#### Exctract the nested json inside the `status` column

In [9]:
pd.json_normalize(cat_df["status"]).head()

Unnamed: 0,verified,sentCount,feedback
0,True,1,
1,True,1,
2,,0,
3,True,1,
4,,0,


In [10]:
cat_df[["verified", "sentCount", "feedback"]] = pd.json_normalize(cat_df["status"])

#### Slim the dataframe

In [11]:
cat_df.columns

Index(['status', '_id', 'updatedAt', 'createdAt', 'user', 'text', 'deleted',
       'source', '__v', 'type', 'used', 'sendDate', 'verified', 'sentCount',
       'feedback'],
      dtype='object')

In [12]:
cat_df_slim = cat_df[["_id", "text", "createdAt", "verified"]].copy()

#### Just the verified facts, pls

In [13]:
verified_df = cat_df_slim[cat_df_slim["verified"] == True].copy()

#### Find facts that mentions specific words? 

In [14]:
verified_df[verified_df["text"].str.lower().str.contains("dog")].head(2)

Unnamed: 0,_id,text,createdAt,verified
37,591f98088dec2e14e3c20b0e,"In 1987, cats overtook dogs as the number one pet in America (about 50 million cats resided in 24 million homes in 1986). About 37% of American homes today have at least one cat.",2018-01-04T01:10:54.673Z,True
59,591f7aab0cf1d60ee8afcd61,A cat's brain is more similar to a man's brain than that of a dog.,2018-01-04T01:10:54.673Z,True


In [15]:
verified_df[verified_df["text"].str.lower().str.contains("dog|food")].head(2)

Unnamed: 0,_id,text,createdAt,verified
37,591f98088dec2e14e3c20b0e,"In 1987, cats overtook dogs as the number one pet in America (about 50 million cats resided in 24 million homes in 1986). About 37% of American homes today have at least one cat.",2018-01-04T01:10:54.673Z,True
59,591f7aab0cf1d60ee8afcd61,A cat's brain is more similar to a man's brain than that of a dog.,2018-01-04T01:10:54.673Z,True


#### Find the oldest fact? 

In [16]:
verified_df["date"] = pd.to_datetime(verified_df["createdAt"]).dt.strftime("%Y-%m-%d")

#### Most recent verified fact?

In [17]:
verified_df.sort_values("date", ascending=False).head()

Unnamed: 0,_id,text,createdAt,verified,date
141,5daa192179186100154250c4,"GitHub is a cloud source version control system where its mascot is an octocat, an anthropomorphized cat with five tentacles.",2019-10-18T19:57:21.696Z,True,2019-10-18
425,5d9d4ae168a764001553b388,Cats conserve energy by sleeping for an average of 13 to 14 hours a day.,2019-10-09T02:50:09.633Z,True,2019-10-09
441,5d9c556168a764001553b382,"A cat has 244 bones in its entire body—even more than a human, who only has 206 bones.",2019-10-08T09:22:41.032Z,True,2019-10-08
438,5d38bd750f1c57001592f155,"Legend holds that a goddess rewarded a temple cat's piety by turning the cat's eyes blue and his coat golden, thus creating the first Birman cat.",2019-07-24T20:20:05.522Z,True,2019-07-24
424,5d38b64e0f1c57001592f134,"The Havana Brown breed hails from England, where it was created by crossbreeding Siamese cats with domestic black cats.",2019-07-24T19:49:34.943Z,True,2019-07-24


---

## Dad jokes!

[Read the documentation](https://icanhazdadjoke.com/api#fetch-a-random-dad-joke)

#### Give the request headers so the API knows how to answer it

In [18]:
headers = {
    "Accept": "application/json",
}

#### Get a response from the API in the format we requested

In [19]:
response = requests.get("https://icanhazdadjoke.com/search?page=1", headers=headers)

#### What comes back?

In [20]:
response.json()

{'current_page': 1,
 'limit': 20,
 'next_page': 2,
 'previous_page': 1,
 'results': [{'id': '0189hNRf2g',
   'joke': "I'm tired of following my dreams. I'm just going to ask them where they are going and meet up with them later."},
  {'id': '08EQZ8EQukb',
   'joke': "Did you hear about the guy whose whole left side was cut off? He's all right now."},
  {'id': '08xHQCdx5Ed',
   'joke': 'Why didn’t the skeleton cross the road? Because he had no guts.'},
  {'id': '0DQKB51oGlb',
   'joke': "What did one nut say as he chased another nut?  I'm a cashew!"},
  {'id': '0DtrrOZDlyd',
   'joke': "Chances are if you' ve seen one shopping center, you've seen a mall."},
  {'id': '0LuXvkq4Muc',
   'joke': "I knew I shouldn't steal a mixer from work, but it was a whisk I was willing to take."},
  {'id': '0ga2EdN7prc',
   'joke': 'How come the stadium got hot after the game? Because all of the fans left.'},
  {'id': '0oO71TSv4Ed',
   'joke': 'Why was it called the dark ages? Because of all the knights.

#### What's the limit per API call? 

In [21]:
response.json()["limit"]

20

#### How many total jokes? 

In [22]:
response.json()["total_jokes"]

649

#### How many pages of 20 jokes? 

In [23]:
response.json()["total_pages"]

33

#### Ok, just the jokes

In [24]:
response.json()["results"]

[{'id': '0189hNRf2g',
  'joke': "I'm tired of following my dreams. I'm just going to ask them where they are going and meet up with them later."},
 {'id': '08EQZ8EQukb',
  'joke': "Did you hear about the guy whose whole left side was cut off? He's all right now."},
 {'id': '08xHQCdx5Ed',
  'joke': 'Why didn’t the skeleton cross the road? Because he had no guts.'},
 {'id': '0DQKB51oGlb',
  'joke': "What did one nut say as he chased another nut?  I'm a cashew!"},
 {'id': '0DtrrOZDlyd',
  'joke': "Chances are if you' ve seen one shopping center, you've seen a mall."},
 {'id': '0LuXvkq4Muc',
  'joke': "I knew I shouldn't steal a mixer from work, but it was a whisk I was willing to take."},
 {'id': '0ga2EdN7prc',
  'joke': 'How come the stadium got hot after the game? Because all of the fans left.'},
 {'id': '0oO71TSv4Ed',
  'joke': 'Why was it called the dark ages? Because of all the knights. '},
 {'id': '0oz51ozk3ob', 'joke': 'A steak pun is a rare medium well done.'},
 {'id': '0ozAXv4Mmj

#### How many records?

In [25]:
len(response.json()["results"])

20

#### Get all the jokes with a loop

In [26]:
data_pages = []

for r in range(0, 34):
    data_pages.append(
        pd.DataFrame(
            requests.get(
                f"https://icanhazdadjoke.com/search?page={r}", headers=headers
            ).json()["results"]
        )
    )

jokes_df = pd.concat(data_pages).reset_index(drop=True)

#### How many? 

In [27]:
len(jokes_df)

669

#### Export 

In [28]:
jokes_df.to_csv("../data/processed/dad-jokes.csv", index=False)