# Accessing data from an API

This notebook has two simple excerises demonstrating how to extract data from an [Application Programming Interface](https://en.wikipedia.org/wiki/API). An API is a tool for computers or applications to interact with one another. In our case, we'll be asking for data, and the API will return it. These systems can be complicated, but most of those we might use in data journalism are relatively simple.

#### Import our data tools

In [63]:
%load_ext lab_black

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black


In [64]:
import pandas as pd
import requests

In [65]:
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = None

---

## Cat facts!

[Read the documentation](https://alexwohlbruck.github.io/cat-facts/docs/)

#### Get random facts

In [66]:
cat_df = pd.read_json(
    "https://cat-fact.herokuapp.com/facts/random?animal_type=cat&amount=500"
)

#### First five rows

In [67]:
cat_df.head()

Unnamed: 0,status,_id,user,text,type,deleted,createdAt,updatedAt,__v,source,used
0,"{'verified': None, 'sentCount': 0}",61d99d283c4466aa794b0d28,61d330ae403b4002d37918f1,My new new fact.,cat,False,2022-01-08T14:18:16.821Z,2022-01-08T14:18:16.821Z,0.0,,
1,"{'verified': True, 'sentCount': 1}",5b1b4055841d9700146158d3,5a9ac18c7478810ea6c06381,Scottish sailer Alexander Selkirk once survived for 4 years on a deserted island thanks to feral cats that protected him from large rats during the night.,cat,False,2018-07-02T20:20:03.046Z,2020-08-23T20:20:01.611Z,0.0,user,0.0
2,"{'verified': True, 'sentCount': 1}",5a456246255f4b0021f54c04,5a9ac18c7478810ea6c06381,A cat can die from essential oils,cat,False,2018-01-25T21:20:05.232Z,2020-08-23T20:20:01.611Z,0.0,user,0.0
3,"{'verified': None, 'sentCount': 0}",6161eac7b5401f0017b61bff,6161eaa6b5401f0017b61bf8,Meow goes Meow after Meow goes Meow.,cat,False,2021-10-09T19:17:27.624Z,2021-10-09T19:17:27.624Z,0.0,,
4,"{'verified': None, 'sentCount': 0}",61faa1ee30eb726f3dda1672,61faa14730eb726f3dda1602,Dsac.,cat,False,2022-02-02T15:23:26.757Z,2022-02-02T15:23:26.757Z,0.0,,


#### How many records? 

In [68]:
len(cat_df)

500

#### What's the first fact?

In [69]:
cat_df["text"][0]

'My new new fact.'

#### Extract the nested json inside the `status` column

In [70]:
cat_df[["verified", "sentCount"]] = pd.json_normalize(cat_df["status"])

In [71]:
cat_df.head()

Unnamed: 0,status,_id,user,text,type,deleted,createdAt,updatedAt,__v,source,used,verified,sentCount
0,"{'verified': None, 'sentCount': 0}",61d99d283c4466aa794b0d28,61d330ae403b4002d37918f1,My new new fact.,cat,False,2022-01-08T14:18:16.821Z,2022-01-08T14:18:16.821Z,0.0,,,,0
1,"{'verified': True, 'sentCount': 1}",5b1b4055841d9700146158d3,5a9ac18c7478810ea6c06381,Scottish sailer Alexander Selkirk once survived for 4 years on a deserted island thanks to feral cats that protected him from large rats during the night.,cat,False,2018-07-02T20:20:03.046Z,2020-08-23T20:20:01.611Z,0.0,user,0.0,True,1
2,"{'verified': True, 'sentCount': 1}",5a456246255f4b0021f54c04,5a9ac18c7478810ea6c06381,A cat can die from essential oils,cat,False,2018-01-25T21:20:05.232Z,2020-08-23T20:20:01.611Z,0.0,user,0.0,True,1
3,"{'verified': None, 'sentCount': 0}",6161eac7b5401f0017b61bff,6161eaa6b5401f0017b61bf8,Meow goes Meow after Meow goes Meow.,cat,False,2021-10-09T19:17:27.624Z,2021-10-09T19:17:27.624Z,0.0,,,,0
4,"{'verified': None, 'sentCount': 0}",61faa1ee30eb726f3dda1672,61faa14730eb726f3dda1602,Dsac.,cat,False,2022-02-02T15:23:26.757Z,2022-02-02T15:23:26.757Z,0.0,,,,0


#### Slim the dataframe

In [72]:
cat_df_slim = cat_df[["_id", "text", "createdAt", "verified"]].copy()

In [73]:
cat_df_slim.head()

Unnamed: 0,_id,text,createdAt,verified
0,61d99d283c4466aa794b0d28,My new new fact.,2022-01-08T14:18:16.821Z,
1,5b1b4055841d9700146158d3,Scottish sailer Alexander Selkirk once survived for 4 years on a deserted island thanks to feral cats that protected him from large rats during the night.,2018-07-02T20:20:03.046Z,True
2,5a456246255f4b0021f54c04,A cat can die from essential oils,2018-01-25T21:20:05.232Z,True
3,6161eac7b5401f0017b61bff,Meow goes Meow after Meow goes Meow.,2021-10-09T19:17:27.624Z,
4,61faa1ee30eb726f3dda1672,Dsac.,2022-02-02T15:23:26.757Z,


#### Just the verified facts, pls

In [74]:
verified_df = cat_df_slim[cat_df_slim["verified"] == True]
verified_df.head()

Unnamed: 0,_id,text,createdAt,verified
1,5b1b4055841d9700146158d3,Scottish sailer Alexander Selkirk once survived for 4 years on a deserted island thanks to feral cats that protected him from large rats during the night.,2018-07-02T20:20:03.046Z,True
2,5a456246255f4b0021f54c04,A cat can die from essential oils,2018-01-25T21:20:05.232Z,True
5,591f98703b90f7150a19c12c,"Tests done by the Behavioral Department of the Musuem of Natural History conclude that while a dog's memory lasts about 5 minutes, a cat's recall can last as long as 16 hours.",2018-01-04T01:10:54.673Z,True
6,591f98783b90f7150a19c1da,The cat appears to be the only domestic companion animal not mentioned in the Bible.,2018-01-04T01:10:54.673Z,True
8,5d38bc970f1c57001592f152,"The pink substance inside a cat's nail is called the ""quick"" or the ""dermis"". When trimming your cat's nails, be sure to only cut the upper white part of the nail, not the quick.",2019-07-24T20:16:23.558Z,True


In [75]:
len(verified_df)

296

#### Find facts that mentions specific words? 

In [76]:
verified_df[verified_df["text"].str.contains("dog|food|toys", case=False)]

Unnamed: 0,_id,text,createdAt,verified
5,591f98703b90f7150a19c12c,"Tests done by the Behavioral Department of the Musuem of Natural History conclude that while a dog's memory lasts about 5 minutes, a cat's recall can last as long as 16 hours.",2018-01-04T01:10:54.673Z,True
40,5b1b3f56841d9700146158cc,Cats lack antibodies against dog blood so they can only receive it via a transfusion once. The second time would kill them.,2018-06-22T20:20:02.344Z,True
45,591f98783b90f7150a19c1cc,British cat owners spend roughly 550 million pounds yearly on cat food.,2018-04-15T20:20:02.691Z,True
54,591f98883b90f7150a19c267,Cat bites are more likely to become infected than dog bites.,2018-01-04T01:10:54.673Z,True
56,591f98783b90f7150a19c1b5,"Cats have 32 muscles that control the outer ear (compared to human's 6 muscles each). A cat can rotate its ears independently 180 degrees, and can turn in the direction of sound 10 times faster than those of the best watchdog.",2018-01-04T01:10:54.673Z,True
84,5c609776e54902001453302d,"When kittens ages to weeks five and six, they should start making the transition to dry food.",2019-02-10T21:28:22.019Z,True
95,591f98803b90f7150a19c238,In 1987 cats overtook dogs as the number one pet in America.,2018-01-04T01:10:54.673Z,True
109,591f98108dec2e14e3c20b0f,Cats have been domesticated for half as long as dogs have been.,2018-01-04T01:10:54.673Z,True
121,5c60999ae549020014533038,"Cats love playing with yarn, ribbons, and fishing-rode style toys. However, cats should not be left alone with anything that they could get tangled in.",2019-02-10T21:37:30.093Z,True
122,591f9894d369931519ce3593,The average cat food meal is the equivalent to about five mice.,2018-01-04T01:10:54.673Z,True


#### Find the oldest fact? 

In [77]:
verified_df["date"] = pd.to_datetime(verified_df["createdAt"]).dt.strftime("%Y-%m-%d")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  verified_df["date"] = pd.to_datetime(verified_df["createdAt"]).dt.strftime("%Y-%m-%d")


In [81]:
verified_df.sort_values("date", ascending=True).head()

Unnamed: 0,_id,text,createdAt,verified,date
257,591f9894d369931519ce35ab,"Cats, especially older cats, do get cancer. Many times this disease can be treated successfully.",2018-01-04T01:10:54.673Z,True,2018-01-04
187,591f98703b90f7150a19c179,A cat's field of vision is about 200 degrees.,2018-01-04T01:10:54.673Z,True,2018-01-04
185,591f98783b90f7150a19c19a,"Mountain lions are strong jumpers, thanks to muscular hind legs that are longer than their front legs.",2018-01-04T01:10:54.673Z,True,2018-01-04
182,591f98088dec2e14e3c20b0e,"In 1987, cats overtook dogs as the number one pet in America (about 50 million cats resided in 24 million homes in 1986). About 37% of American homes today have at least one cat.",2018-01-04T01:10:54.673Z,True,2018-01-04
180,591f97c48dec2e14e3c20afc,"Cats respond most readily to names that end in an ""ee"" sound.",2018-01-04T01:10:54.673Z,True,2018-01-04


#### Most recent verified fact?

In [82]:
verified_df.sort_values("date", ascending=False).head()

Unnamed: 0,_id,text,createdAt,verified,date
443,5de780600013130015a3ccaf,"About one in two cats respond to catnip, and only develop a sensitivity to it at around 3 to 6 months of age.",2019-12-04T09:46:08.461Z,True,2019-12-04
353,5daa192179186100154250c4,"GitHub is a cloud source version control system where its mascot is an octocat, an anthropomorphized cat with five tentacles.",2019-10-18T19:57:21.696Z,True,2019-10-18
248,5d9d4ae168a764001553b388,Cats conserve energy by sleeping for an average of 13 to 14 hours a day.,2019-10-09T02:50:09.633Z,True,2019-10-09
240,5d38bd750f1c57001592f155,"Legend holds that a goddess rewarded a temple cat's piety by turning the cat's eyes blue and his coat golden, thus creating the first Birman cat.",2019-07-24T20:20:05.522Z,True,2019-07-24
73,5d38baf20f1c57001592f14b,"The stationmaster of the Kishi rail station in western Japan until 2015 was a cat named Tama. The calico cat wore a stationmaster's cap and greeted visitors by the ticket gate. After her death, Tama was elevated to the status of a goddess in a Shinto-style funeral.",2019-07-24T20:09:22.445Z,True,2019-07-24


---

## Dad jokes!

[Read the documentation](https://icanhazdadjoke.com/api#fetch-a-random-dad-joke)

#### Give the request headers so the API knows how to answer it

In [83]:
headers = {
    "Accept": "application/json",
}

#### Get a response from the API in the format we requested

In [84]:
response = requests.get("https://icanhazdadjoke.com/search?page=1", headers=headers)

#### What comes back?

In [87]:
response.json()

{'current_page': 1,
 'limit': 20,
 'next_page': 2,
 'previous_page': 1,
 'results': [{'id': '0189hNRf2g',
   'joke': "I'm tired of following my dreams. I'm just going to ask them where they are going and meet up with them later."},
  {'id': '08EQZ8EQukb',
   'joke': "Did you hear about the guy whose whole left side was cut off? He's all right now."},
  {'id': '08xHQCdx5Ed',
   'joke': 'Why didn’t the skeleton cross the road? Because he had no guts.'},
  {'id': '0DQKB51oGlb',
   'joke': "What did one nut say as he chased another nut?  I'm a cashew!"},
  {'id': '0DtrrOZDlyd',
   'joke': "Chances are if you' ve seen one shopping center, you've seen a mall."},
  {'id': '0LuXvkq4Muc',
   'joke': "I knew I shouldn't steal a mixer from work, but it was a whisk I was willing to take."},
  {'id': '0ga2EdN7prc',
   'joke': 'How come the stadium got hot after the game? Because all of the fans left.'},
  {'id': '0oO71TSv4Ed',
   'joke': 'Why was it called the dark ages? Because of all the knights.

#### What's the limit per API call? 

In [99]:
response.json()["limit"]

20

#### How many total jokes? 

In [100]:
response.json()["total_jokes"]

649

#### How many pages of 20 jokes? 

In [101]:
response.json()["total_pages"]

33

#### Ok, just the jokes

In [96]:
jokes_df = pd.DataFrame(response.json()["results"])
jokes_df.head()

Unnamed: 0,id,joke
0,0189hNRf2g,I'm tired of following my dreams. I'm just going to ask them where they are going and meet up with them later.
1,08EQZ8EQukb,Did you hear about the guy whose whole left side was cut off? He's all right now.
2,08xHQCdx5Ed,Why didn’t the skeleton cross the road? Because he had no guts.
3,0DQKB51oGlb,What did one nut say as he chased another nut? I'm a cashew!
4,0DtrrOZDlyd,"Chances are if you' ve seen one shopping center, you've seen a mall."


#### How many records?

In [92]:
len(jokes_df)

20

#### Get all the jokes with a loop

In [102]:
data_pages = []

for r in range(0, 34):
    data_pages.append(
        pd.DataFrame(
            requests.get(
                f"https://icanhazdadjoke.com/search?page={r}", headers=headers
            ).json()["results"]
        )
    )

jokes_df = pd.concat(data_pages).reset_index(drop=True)

#### How many? 

In [103]:
len(jokes_df)

669

#### Export 

In [104]:
jokes_df.to_csv("../data/processed/dad-jokes.csv", index=False)