# Accessing data from an API

This notebook has two simple excerises demonstrating how to extract data from an [Application Programming Interface](https://en.wikipedia.org/wiki/API). An API is a tool for computers or applications to interact with one another. In our case, we'll be asking for data, and the API will return it. These systems can be complicated, but most of those we might use in data journalism are relatively simple.

#### Import our data tools

In [65]:
%load_ext lab_black

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black


In [66]:
import pandas as pd
import requests

In [67]:
pd.options.display.max_columns = 100
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = None

---

## Cat facts!

[Read the documentation](https://alexwohlbruck.github.io/cat-facts/docs/)

#### Get random facts

In [68]:
cat_df = pd.read_json(
    "https://cat-fact.herokuapp.com/facts/random?animal_type=cat&amount=500"
)

#### First five rows

In [69]:
cat_df.head()

Unnamed: 0,status,_id,__v,text,source,updatedAt,type,createdAt,deleted,used,user
0,"{'verified': True, 'sentCount': 1}",591f9890d369931519ce352d,0.0,A cat's tongue has tiny barbs on it.,api,2020-08-23T20:20:01.611Z,cat,2018-04-20T20:20:02.940Z,False,0.0,5a9ac18c7478810ea6c06381
1,"{'verified': True, 'sentCount': 1}",5d38be370f1c57001592f158,0.0,Kittens typically begin to engage in playful behavior at around four weeks of age.,user,2020-08-23T20:20:01.611Z,cat,2019-07-24T20:23:19.615Z,False,0.0,5a9ac18c7478810ea6c06381
2,"{'verified': True, 'sentCount': 1}",591f98783b90f7150a19c1ad,0.0,"The catnip plant contains an oil called hepetalactone which does for cats what marijuana does to some people. Not all cats react to it those that do appear to enter a trancelike state. A positive reaction takes the form of the cat sniffing the catnip, then licking, biting, chewing it, rub & rolling on it repeatedly, purring, meowing & even leaping in the air.",api,2020-08-23T20:20:01.611Z,cat,2018-04-20T20:27:41.961Z,False,0.0,5a9ac18c7478810ea6c06381
3,"{'verified': True, 'sentCount': 1}",591f98a4d369931519ce3678,0.0,Cats are now Britain's favourite pet: there are 7.7 million cats as opposed to 6.6 million dogs.,api,2020-08-23T20:20:01.611Z,cat,2018-01-04T01:10:54.673Z,False,0.0,5a9ac18c7478810ea6c06381
4,"{'verified': True, 'sentCount': 1}",591f98703b90f7150a19c166,0.0,Mother cats teach their kittens to use the litter box.,api,2020-08-23T20:20:01.611Z,cat,2018-01-04T01:10:54.673Z,False,0.0,5a9ac18c7478810ea6c06381


#### How many records? 

In [70]:
len(cat_df)

500

#### What's the first fact?

In [71]:
cat_df["text"][0]

"A cat's tongue has tiny barbs on it."

#### Exctract the nested json inside the `status` column

In [72]:
pd.json_normalize(cat_df["status"]).head()

Unnamed: 0,verified,sentCount
0,True,1
1,True,1
2,True,1
3,True,1
4,True,1


In [73]:
cat_df[["verified", "sentCount", "feedback"]] = pd.json_normalize(cat_df["status"])
cat_df.head()

ValueError: Columns must be same length as key

#### Slim the dataframe

In [None]:
cat_df_slim = cat_df[["_id", "text", "createdAt", "verified"]].copy()

In [None]:
cat_df_slim.sample(5)

Unnamed: 0,_id,text,createdAt,verified
124,591f98703b90f7150a19c139,"Female cats are ""polyestrous,"" which means they may have many heat periods over the course of a year. A heat period lasts about 4 to 7 days if the female is bred; if she is not, the heat period lasts longer and recurs at regular intervals.",2018-01-04T01:10:54.673Z,True
379,61add919d8d914c027b872f9,Hello cat.,2021-12-06T09:34:17.366Z,
126,5b4911770508220014ccfe93,"Unlike kittens, adult cats don’t release any particular key hormones during sleep. They snooze all day just because they can. :)",2018-07-28T20:20:02.622Z,True
410,619fe3e761ba0ea74c673184,Asd.,2021-11-25T19:28:39.896Z,
254,58e007f50aac31001185ecf8,A group of cats is called a clowder.,2018-03-20T20:20:02.514Z,True


#### Just the verified facts, pls

In [None]:
verified_df = cat_df_slim[cat_df_slim["verified"] == True]
len(verified_df)

282

#### Find facts that mentions specific words? 

In [None]:
len(verified_df[verified_df["text"].str.lower().str.contains("dog|food|toy")])

24

#### Find the oldest fact? 

In [None]:
verified_df["date"] = pd.to_datetime(verified_df["createdAt"]).dt.strftime("%Y-%m-%d")
verified_df.sort_values(["date"], ascending=True).head(1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  verified_df["date"] = pd.to_datetime(verified_df["createdAt"]).dt.strftime("%Y-%m-%d")


Unnamed: 0,_id,text,createdAt,verified,date
250,591f97a9ccb34a14d3f7dc90,The Pilgrims were the first to introduce cats to North America.,2018-01-04T01:10:54.673Z,True,2018-01-04


#### Most recent verified fact?

In [None]:
verified_df.sort_values(["date"], ascending=False).head(1)

Unnamed: 0,_id,text,createdAt,verified,date
50,5de780600013130015a3ccaf,"About one in two cats respond to catnip, and only develop a sensitivity to it at around 3 to 6 months of age.",2019-12-04T09:46:08.461Z,True,2019-12-04


---

## Dad jokes!

[Read the documentation](https://icanhazdadjoke.com/api#fetch-a-random-dad-joke)

#### Give the request headers so the API knows how to answer it

In [74]:
headers = {
    "Accept": "application/json",
}

#### Get a response from the API in the format we requested

In [75]:
response = requests.get("https://icanhazdadjoke.com/search?page=1", headers=headers)

#### What comes back?

In [83]:
response.json()

{'current_page': 1,
 'limit': 20,
 'next_page': 2,
 'previous_page': 1,
 'results': [{'id': '0189hNRf2g',
   'joke': "I'm tired of following my dreams. I'm just going to ask them where they are going and meet up with them later."},
  {'id': '08EQZ8EQukb',
   'joke': "Did you hear about the guy whose whole left side was cut off? He's all right now."},
  {'id': '08xHQCdx5Ed',
   'joke': 'Why didn’t the skeleton cross the road? Because he had no guts.'},
  {'id': '0DQKB51oGlb',
   'joke': "What did one nut say as he chased another nut?  I'm a cashew!"},
  {'id': '0DtrrOZDlyd',
   'joke': "Chances are if you' ve seen one shopping center, you've seen a mall."},
  {'id': '0LuXvkq4Muc',
   'joke': "I knew I shouldn't steal a mixer from work, but it was a whisk I was willing to take."},
  {'id': '0ga2EdN7prc',
   'joke': 'How come the stadium got hot after the game? Because all of the fans left.'},
  {'id': '0oO71TSv4Ed',
   'joke': 'Why was it called the dark ages? Because of all the knights.

#### What's the limit per API call? 

In [84]:
response.json()["limit"]

20

#### How many total jokes? 

In [85]:
response.json()["total_jokes"]

649

#### How many pages of 20 jokes? 

In [87]:
response.json()["total_pages"]

33

#### Ok, just the jokes

In [92]:
jokes_df = pd.DataFrame(response.json()["results"])

#### How many records?

In [93]:
len(jokes_df)

20

#### Get all the jokes with a loop

#### How many? 

#### Export 