# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [4]:
import requests
import json
import pandas as pd

# Define the URL for the Nobel Prize API
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

# Make the GET request to the API
response = requests.get(url)

# Check if the response was successful
if response.status_code == 200:
    print("All good!")
    print("==============")
    print("\n")
    
    # Proceed to process the data (for demonstration, we'll just show part of the JSON data)
    data = response.json()
    # Display the first few entries to understand the structure
    sample_data = data['nobelPrizes'][:2]  # Display the first 2 entries
    print(sample_data)
else:
    print("Failed to retrieve data. Status code:", response.status_code)


All good!


[{'awardYear': '1901', 'category': {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}, 'categoryFullName': {'en': 'The Nobel Prize in Chemistry', 'no': 'Nobelprisen i kjemi', 'se': 'Nobelpriset i kemi'}, 'dateAwarded': '1901-11-12', 'prizeAmount': 150782, 'prizeAmountAdjusted': 10531894, 'links': [{'rel': 'nobelPrize', 'href': 'https://api.nobelprize.org/2/nobelPrize/che/1901', 'action': 'GET', 'types': 'application/json'}], 'laureates': [{'id': '160', 'knownName': {'en': "Jacobus H. van 't Hoff"}, 'fullName': {'en': "Jacobus Henricus van 't Hoff"}, 'portion': '1', 'sortOrder': '1', 'motivation': {'en': 'in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions', 'se': 'såsom ett erkännande av den utomordentliga förtjänst han inlagt genom upptäckten av lagarna för den kemiska dynamiken och för det osmotiska trycket i lösningar'}, 'links': [{'rel': 'laureate', 'href': 'https://api.nobelprize

# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [6]:
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
    prizes_list = response.json()['nobelPrizes']
    prizes_list = response.json()['nobelPrizes']
    print(f"Number of Nobel Prizes fetched: {len(prizes_list)}")


Number of Nobel Prizes fetched: 670


In [12]:
    for i, prize in enumerate(prizes_list[:3]): 
        print(f"Prize {i+1}: {prize}")

Prize 1: {'awardYear': '1901', 'category': {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}, 'categoryFullName': {'en': 'The Nobel Prize in Chemistry', 'no': 'Nobelprisen i kjemi', 'se': 'Nobelpriset i kemi'}, 'dateAwarded': '1901-11-12', 'prizeAmount': 150782, 'prizeAmountAdjusted': 10531894, 'links': [{'rel': 'nobelPrize', 'href': 'https://api.nobelprize.org/2/nobelPrize/che/1901', 'action': 'GET', 'types': 'application/json'}], 'laureates': [{'id': '160', 'knownName': {'en': "Jacobus H. van 't Hoff"}, 'fullName': {'en': "Jacobus Henricus van 't Hoff"}, 'portion': '1', 'sortOrder': '1', 'motivation': {'en': 'in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions', 'se': 'såsom ett erkännande av den utomordentliga förtjänst han inlagt genom upptäckten av lagarna för den kemiska dynamiken och för det osmotiska trycket i lösningar'}, 'links': [{'rel': 'laureate', 'href': 'https://api.nobelprize.org

In [14]:
import requests

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"
response = requests.get(url)

if response.status_code == 200:
    print("Request was successful!")
    
    prizes_list = response.json()['nobelPrizes']
    print(f"Number of Nobel Prizes fetched: {len(prizes_list)}")
    
    for i, prize in enumerate(prizes_list[:3]):  # Display the first 3 prizes
        print(f"Prize {i+1}: {prize}")
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")


Request was successful!
Number of Nobel Prizes fetched: 670
Prize 1: {'awardYear': '1901', 'category': {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}, 'categoryFullName': {'en': 'The Nobel Prize in Chemistry', 'no': 'Nobelprisen i kjemi', 'se': 'Nobelpriset i kemi'}, 'dateAwarded': '1901-11-12', 'prizeAmount': 150782, 'prizeAmountAdjusted': 10531894, 'links': [{'rel': 'nobelPrize', 'href': 'https://api.nobelprize.org/2/nobelPrize/che/1901', 'action': 'GET', 'types': 'application/json'}], 'laureates': [{'id': '160', 'knownName': {'en': "Jacobus H. van 't Hoff"}, 'fullName': {'en': "Jacobus Henricus van 't Hoff"}, 'portion': '1', 'sortOrder': '1', 'motivation': {'en': 'in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions', 'se': 'såsom ett erkännande av den utomordentliga förtjänst han inlagt genom upptäckten av lagarna för den kemiska dynamiken och för det osmotiska trycket i lösningar'}, 'link

In [18]:
import requests
import json

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"
response = requests.get(url)

if response.status_code == 200:
    print("Request was successful!")
    
    prizes_list = response.json()['nobelPrizes']
    print(f"Number of Nobel Prizes fetched: {len(prizes_list)}")
    
    for i, prize in enumerate(prizes_list[:3]):  
        print(f"\nPrize {i+1}:")
        print(json.dumps(prize, indent=4))  
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")


Request was successful!
Number of Nobel Prizes fetched: 670

Prize 1:
{
    "awardYear": "1901",
    "category": {
        "en": "Chemistry",
        "no": "Kjemi",
        "se": "Kemi"
    },
    "categoryFullName": {
        "en": "The Nobel Prize in Chemistry",
        "no": "Nobelprisen i kjemi",
        "se": "Nobelpriset i kemi"
    },
    "dateAwarded": "1901-11-12",
    "prizeAmount": 150782,
    "prizeAmountAdjusted": 10531894,
    "links": [
        {
            "rel": "nobelPrize",
            "href": "https://api.nobelprize.org/2/nobelPrize/che/1901",
            "action": "GET",
            "types": "application/json"
        }
    ],
    "laureates": [
        {
            "id": "160",
            "knownName": {
                "en": "Jacobus H. van 't Hoff"
            },
            "fullName": {
                "en": "Jacobus Henricus van 't Hoff"
            },
            "portion": "1",
            "sortOrder": "1",
            "motivation": {
                "en"

# Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [26]:
from datetime import datetime

def get_name(laureate):
    if 'fullName' in laureate:
        return laureate['fullName']['en']
    elif 'orgName' in laureate:
        return laureate['orgName']['en']
    return "Unknown"

def get_gender(laureate):
    if 'gender' in laureate:
        return laureate['gender']
    elif 'orgName' in laureate:
        return "None"
    return "Unknown"

def get_birthdate(laureate):
    if 'birth' in laureate and 'date' in laureate['birth']:
        return laureate['birth']['date']
    return "Unknown"

def get_age(laureate):
    birth_date = get_birthdate(laureate)
    if birth_date != "Unknown":
        birth_date = datetime.strptime(birth_date, "%Y-%m-%d")
        award_year = int(laureate['nobelPrizes'][0]['awardYear'])
        award_date = datetime(award_year, 12, 31)  # Assuming award date is at the end of the year
        age = award_date.year - birth_date.year - ((award_date.month, award_date.day) < (birth_date.month, birth_date.day))
        return age
    return "Unknown"

def get_city(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'city' in laureate['birth']['place']:
        return laureate['birth']['place']['city']['en']
    return "Unknown"

def get_country(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'country' in laureate['birth']['place']:
        return laureate['birth']['place']['country']['en']
    return "Unknown"

def get_continent(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'continent' in laureate['birth']['place']:
        return laureate['birth']['place']['continent']['en']
    return "Unknown"

def get_latitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'city' in laureate['birth']['place'] and 'latitude' in laureate['birth']['place']['city']:
        return laureate['birth']['place']['city']['latitude']
    return "Unknown"

def get_longitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'city' in laureate['birth']['place'] and 'longitude' in laureate['birth']['place']['city']:
        return laureate['birth']['place']['city']['longitude']
    return "Unknown"


In [30]:
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}


In [34]:
import time
from tqdm import tqdm
import requests
import pandas as pd

unique_ids = [1, 2, 3]  

for index, id in enumerate(tqdm(unique_ids)):
    url = f"https://api.nobelprize.org/2/laureate/{id}"
    response = requests.get(url)

    if response.status_code == 200:
        laureate = response.json()

        laureates_dict["ID"].append(id)
        for key in functions_dict.keys():
            if key != "ID":
                function = functions_dict[key]
                laureates_dict[key].append(function(laureate))
    else:
        print(f"Failed to retrieve data for ID {id}")

laureates_df = pd.DataFrame(laureates_dict)

# Display the DataFrame
print(laureates_df.head())


100%|█████████████████████████████████████████████| 3/3 [00:02<00:00,  1.31it/s]

   ID     Name   Gender Birth_date      Age     City  Country Continent  \
0   1  Unknown  Unknown    Unknown  Unknown  Unknown  Unknown   Unknown   
1   2  Unknown  Unknown    Unknown  Unknown  Unknown  Unknown   Unknown   
2   3  Unknown  Unknown    Unknown  Unknown  Unknown  Unknown   Unknown   

  Latitude Longitude  
0  Unknown   Unknown  
1  Unknown   Unknown  
2  Unknown   Unknown  





In [32]:
from datetime import datetime

def get_name(laureate):
    # Check if it's a person
    if 'fullName' in laureate:
        return laureate['fullName']['en']
    # Check if it's an organization
    elif 'orgName' in laureate:
        return laureate['orgName']['en']
    else:
        return "Unknown"

def get_gender(laureate):
    # Return the gender if it's a person
    if 'gender' in laureate:
        return laureate['gender']
    # Return 'None' if it's an organization
    elif 'orgName' in laureate:
        return "None"
    else:
        return "Unknown"

def get_birthdate(laureate):
    # Return the birthdate if it's a person
    if 'birth' in laureate and 'date' in laureate['birth']:
        return laureate['birth']['date']
    else:
        return "Unknown"

def get_age(laureate):
    # Calculate age if birthdate is available
    birth_date = get_birthdate(laureate)
    if birth_date != "Unknown":
        birth_date = datetime.strptime(birth_date, "%Y-%m-%d")
        award_year = int(laureate['nobelPrizes'][0]['awardYear'])
        award_date = datetime(award_year, 12, 31)  # Assuming award date is at the end of the year
        age = award_date.year - birth_date.year - ((award_date.month, award_date.day) < (birth_date.month, birth_date.day))
        return age
    else:
        return "Unknown"

def get_city(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'city' in laureate['birth']['place']:
        return laureate['birth']['place']['city']['en']
    else:
        return "Unknown"

def get_country(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'country' in laureate['birth']['place']:
        return laureate['birth']['place']['country']['en']
    else:
        return "Unknown"

def get_continent(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'continent' in laureate['birth']['place']:
        return laureate['birth']['place']['continent']['en']
    else:
        return "Unknown"

def get_latitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'city' in laureate['birth']['place'] and 'latitude' in laureate['birth']['place']['city']:
        return laureate['birth']['place']['city']['latitude']
    else:
        return "Unknown"

def get_longitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'city' in laureate['birth']['place'] and 'longitude' in laureate['birth']['place']['city']:
        return laureate['birth']['place']['city']['longitude']
    else:
        return "Unknown"


# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [38]:
if 'Country' not in laureates_df.columns:
    print("The 'Country' column is missing in the DataFrame.")
else:
    country_counts = laureates_df['Country'].value_counts().reset_index()
    
    country_counts.columns = ['Country', 'Number of Awards']
    
    country_counts = country_counts.sort_values(by='Number of Awards', ascending=False).reset_index(drop=True)
    
    print(country_counts)


   Country  Number of Awards
0  Unknown                 3
