## APIs Exercises
### Part A
1) Access the [Cat Breeds](https://www.google.com/url?q=https://catfact.ninja/breeds&sa=D&source=editors&ust=1747071678745371&usg=AOvVaw2G62giLL6hJLNeGD-IAo7P) API and download data from all pages

In [1]:
import requests
import pandas as pd
from time import sleep
import logging

logger = logging.getLogger(__name__)
logger.addHandler(logging.FileHandler('log.txt', 'a'))
logger.handlers[0].setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)

BASE_URL = "https://catfact.ninja/breeds"
response = requests.get(BASE_URL)
response.status_code, response.reason

(200, 'OK')

In [2]:
response.json().keys()

dict_keys(['current_page', 'data', 'first_page_url', 'from', 'last_page', 'last_page_url', 'links', 'next_page_url', 'path', 'per_page', 'prev_page_url', 'to', 'total'])

In [3]:
response.json()["next_page_url"]

'https://catfact.ninja/breeds?page=2'

In [4]:
data_list = response.json()['data']
len(data_list)

while response.json()['next_page_url']:
    print('fetching next page')
    # introduce 10-sec delay, in order to avoid 'Too Many Requests' 429 Error
    sleep(10)
    # make a request for the next page
    response = requests.get(response.json()['next_page_url'])
    print(response.status_code, response.reason)
    # if request is successful print current page and total pages
    if response.status_code < 300:
        print(f"received page {response.json()['current_page']}/{response.json()['last_page']}")
    # break the request loop if a response fails to return data
    try:
        data_list.extend(response.json()['data'])
    except KeyError:
        break
len(data_list)

fetching next page
200 OK
received page 2/4
fetching next page
200 OK
received page 3/4
fetching next page
200 OK
received page 4/4


98

In [5]:
data_list[:3]

[{'breed': 'Abyssinian',
  'country': 'Ethiopia',
  'origin': 'Natural/Standard',
  'coat': 'Short',
  'pattern': 'Ticked'},
 {'breed': 'Aegean',
  'country': 'Greece',
  'origin': 'Natural/Standard',
  'coat': 'Semi-long',
  'pattern': 'Bi- or tri-colored'},
 {'breed': 'American Curl',
  'country': 'United States',
  'origin': 'Mutation',
  'coat': 'Short/Long',
  'pattern': 'All'}]

In [6]:
df = pd.DataFrame(data_list)

In [7]:
df.shape

(98, 5)

In [8]:
df.head()

Unnamed: 0,breed,country,origin,coat,pattern
0,Abyssinian,Ethiopia,Natural/Standard,Short,Ticked
1,Aegean,Greece,Natural/Standard,Semi-long,Bi- or tri-colored
2,American Curl,United States,Mutation,Short/Long,All
3,American Bobtail,United States,Mutation,Short/Long,All
4,American Shorthair,United States,Natural,Short,All but colorpoint


In [9]:
df["breed"].nunique()

98

2) Which country has the highest number of cat breeds?

In [10]:
df.groupby("country")["breed"].count().sort_values(
    ascending=False
).head(1)

country
United States    28
Name: breed, dtype: int64

3) What is the percentage of Hairless breeds?

In [11]:
hairless_breed_prc = (
    round(len(df[df["coat"].str.contains("Hairless")]) / len(df) * 100, 2)
)

In [12]:
hairless_breed_prc

8.16

### Part B
Create an account at https://api-ninjas.com/
1) Enhance the previous dataset with additional cat data! Use the api-ninja API of cats to get the data and then merge them into one dataframe.
(use the cat breeds names you collected earlier)

In [13]:
# ninjas api
name = "A"
api_key = "E5kfnyG/lNmQ/NLH9kG9jA==ePaCPFr1qN7jy6QK"
url = f"https://api.api-ninjas.com/v1/cats?name={name}"
headers = {"X-Api-Key": api_key}

In [14]:
response = requests.get(url, headers=headers)

In [15]:
response.json()[0]

{'length': '12 to 16 inches',
 'origin': 'Southeast Asia',
 'image_link': 'https://api-ninjas.com/images/cats/abyssinian.jpg',
 'family_friendly': 3,
 'shedding': 3,
 'general_health': 2,
 'playfulness': 5,
 'children_friendly': 5,
 'grooming': 3,
 'intelligence': 5,
 'other_pets_friendly': 5,
 'min_weight': 6.0,
 'max_weight': 10.0,
 'min_life_expectancy': 9.0,
 'max_life_expectancy': 15.0,
 'name': 'Abyssinian'}

In [16]:
lst = []
# loop through database using initial datasets keys
for breed in df["breed"]:
    response = requests.get(
        f"https://api.api-ninjas.com/v1/cats?name={breed}",
        headers=headers,
    )
    logger.info(f"{breed}, {response.status_code}, {response.reason}, data list size: {len(response.json())}")
    if len(response.json()) > 0:
        breed_list = response.json()
        for d in breed_list:
            # add initial dataset breed a key to each retrieved data entry
            d['breed'] = breed
        lst.extend(breed_list)
df2 = pd.DataFrame(lst)

In [17]:
# create a dataframe by merging the two dataframes
merged_df = pd.merge(
    df, df2, on="breed", how="inner"
)

In [18]:
merged_df.head()

Unnamed: 0,breed,country,origin_x,coat,pattern,length,origin_y,image_link,family_friendly,shedding,...,grooming,intelligence,other_pets_friendly,min_weight,max_weight,min_life_expectancy,max_life_expectancy,name,meowing,stranger_friendly
0,Abyssinian,Ethiopia,Natural/Standard,Short,Ticked,12 to 16 inches,Southeast Asia,https://api-ninjas.com/images/cats/abyssinian.jpg,3,3,...,3,5.0,5,6.0,10.0,9.0,15.0,Abyssinian,,
1,Aegean,Greece,Natural/Standard,Semi-long,Bi- or tri-colored,Medium,Greece,https://api-ninjas.com/images/cats/aegean.jpg,5,3,...,4,4.0,3,7.0,10.0,9.0,10.0,Aegean,4.0,4.0
2,American Curl,United States,Mutation,Short/Long,All,Medium,"California, USA",https://api-ninjas.com/images/cats/american_cu...,5,4,...,4,,4,5.0,10.0,12.0,16.0,American Curl,5.0,4.0
3,American Bobtail,United States,Mutation,Short/Long,All,Medium,United States and Canada,https://api-ninjas.com/images/cats/american_bo...,4,4,...,3,4.0,4,8.0,13.0,11.0,15.0,American Bobtail,3.0,4.0
4,American Shorthair,United States,Natural,Short,All but colorpoint,12 to 15 inches,United States,https://api-ninjas.com/images/cats/american_sh...,3,3,...,4,4.0,3,7.0,12.0,15.0,20.0,American Shorthair,,4.0


In [19]:
# create a column that combines breed and name to act as a unique description for each breed
merged_df.insert(0, 'breed-name', merged_df['breed'] + '-' + merged_df['name'])

In [20]:
# set it as index
merged_df.set_index('breed-name', inplace=True)

In [21]:
merged_df.columns

Index(['breed', 'country', 'origin_x', 'coat', 'pattern', 'length', 'origin_y',
       'image_link', 'family_friendly', 'shedding', 'general_health',
       'playfulness', 'children_friendly', 'grooming', 'intelligence',
       'other_pets_friendly', 'min_weight', 'max_weight',
       'min_life_expectancy', 'max_life_expectancy', 'name', 'meowing',
       'stranger_friendly'],
      dtype='object')

2) In which country do you have the heaviest cats?

In [22]:
merged_df['weight'] = (merged_df['max_weight'] + merged_df['min_weight']) / 2

In [23]:
merged_df.groupby('country')['weight'].mean().idxmax()
merged_df.groupby('country')['weight'].mean().sort_values(ascending=False).head(1)

country
France    17.0
Name: weight, dtype: float64

3) Which cats are the most friendly and playful (find the top 5)? What is their life expectancy?

In [24]:
merged_df['friendly_playful'] = ((merged_df['children_friendly'] + merged_df['family_friendly'] + merged_df['other_pets_friendly'] + merged_df['stranger_friendly']) / 4 + merged_df['playfulness']) / 2
merged_df['friendly_playful'] = round(merged_df['friendly_playful'], 1)
merged_df['life_expectancy'] = round((merged_df['max_life_expectancy'] + merged_df['min_life_expectancy']) / 2, 1)

In [25]:
merged_df[['friendly_playful', 'life_expectancy']].sort_values('friendly_playful', ascending=False).head()

Unnamed: 0_level_0,friendly_playful,life_expectancy
breed-name,Unnamed: 1_level_1,Unnamed: 2_level_1
Cornish Rex-Cornish Rex,5.0,13.0
Burmese-European Burmese,5.0,12.5
Somali-Somali,4.9,13.5
Bengal-Bengal Cats,4.8,13.0
Siberian-Siberian,4.8,14.5


### Part C (Bonus)
Create an account at [Giant Bomb](https://www.google.com/url?q=https://www.giantbomb.com/&sa=D&source=editors&ust=1747071678746325&usg=AOvVaw2VS6aozGNpG1z5T_3-7Zt7). Get your [API key](https://www.google.com/url?q=https://www.giantbomb.com/api/&sa=D&source=editors&ust=1747071678746403&usg=AOvVaw26d4B1q2efOS1NpiKNJEWn) from the API page.
Now use the following link in your Python script to get data from the API:

``` python
https://www.giantbomb.com/api/games/?filter=platforms%3A35&field_list=id%2Cname%2Coriginal_game_rating%2Coriginal_release_date&
sort=name%3Adesc&limit=10&offset=0&api_key=___your___api___key___&format=json
```
- There is an offset in the link. You have to change the offset repeatedly to get all the results from the API.
- To do so you can use the following code using the requests library.

``` python
import requests
import json

link = https://www.giantbomb.com/api/games/?filter=platforms%3A35&field_list=id%2Cname%2Coriginal_game_rating%2Coriginal_release_date&sort=name%3Adesc&limit=10&offset=0&api_key=___your___api___key___&format=json

headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}

response = requests.get(link, headers=headers)
json.loads(response.content)
```
- Gather all the data and create a pandas Dataframe.

In [34]:
api_key = "41f683912546398d2c82e7725e429d201c1df3b1"
link = f"https://www.giantbomb.com/api/games/?filter=platforms%3A35&field_list=id%2Cname%2Coriginal_game_rating%2Coriginal_release_date&sort=name%3Adesc&limit=10&offset=0&api_key={api_key}&format=json"
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}

In [35]:
import requests
import json

response = requests.get(link, headers=headers)
response_dict = json.loads(response.content)
response_dict.keys()

dict_keys(['error', 'limit', 'offset', 'number_of_page_results', 'number_of_total_results', 'status_code', 'results', 'version'])

In [36]:
response_dict['results'][0]

{'id': 31743,
 'name': 'Zumba Fitness',
 'original_game_rating': [{'api_detail_url': 'https://www.giantbomb.com/api/game_rating/3065-6/',
   'id': 6,
   'name': 'ESRB: E'},
  {'api_detail_url': 'https://www.giantbomb.com/api/game_rating/3065-7/',
   'id': 7,
   'name': 'PEGI: 3+'},
  {'api_detail_url': 'https://www.giantbomb.com/api/game_rating/3065-14/',
   'id': 14,
   'name': 'OFLC: G'}],
 'original_release_date': '2010-11-30'}

In [37]:
# get total results
total_results = response_dict["number_of_total_results"]
total_results


1736

In [38]:
limit = 100  # set the per-page results - increasing will reduce the number of requests

accumulated_results = []
for i in range(0, total_results, limit):
    link = f"https://www.giantbomb.com/api/games/?filter=platforms%3A35&field_list=id%2Cname%2Coriginal_game_rating%2Coriginal_release_date&sort=name%3Adesc&limit={limit}&offset={i}&api_key={api_key}&format=json"
    page_response = requests.get(link, headers=headers)
    logger.info(f"fetching results {i + 1} to {i + limit}, response code: {page_response.status_code}, {response.reason}")
    # get current page json (dict)
    page_response_dict = json.loads(page_response.content)
    # get a list of the pages results, and use it to extend our
    accumulated_results.extend(page_response_dict['results'])

In [39]:
df = pd.DataFrame(accumulated_results)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1736 entries, 0 to 1735
Data columns (total 4 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   id                     1736 non-null   int64 
 1   name                   1736 non-null   object
 2   original_game_rating   1430 non-null   object
 3   original_release_date  1630 non-null   object
dtypes: int64(1), object(3)
memory usage: 54.4+ KB


In [40]:
df.head()

Unnamed: 0,id,name,original_game_rating,original_release_date
0,31743,Zumba Fitness,[{'api_detail_url': 'https://www.giantbomb.com...,2010-11-30
1,10225,Zone of the Enders: The 2nd Runner,[{'api_detail_url': 'https://www.giantbomb.com...,2003-02-13
2,35511,Zone of the Enders HD Collection,[{'api_detail_url': 'https://www.giantbomb.com...,2012-11-30
3,11350,Zone of the Enders,[{'api_detail_url': 'https://www.giantbomb.com...,2001-03-02
4,41638,Zillions of Enemy X: Zetsukai no Crusade,[{'api_detail_url': 'https://www.giantbomb.com...,2013-05-23


In [41]:
df['id'].nunique()

1736