# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [53]:
import requests
import json
import pandas as pd

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
        # Your code here
else:
    print(f"Error: {response.status_code}")

All good!




In [55]:
data = response.json()
#display(data)

In [57]:
nobel_prizes_list = []

In [64]:
for prize in prizes_list:
        # Extract required fields
        category = prize['category']['en']  # Category of the prize
        
        # Handle missing 'dateAwarded' by setting it to None if not present
        dateAwarded = prize.get('dateAwarded', None)
        if dateAwarded:
            dateAwarded = datetime.strptime(dateAwarded, "%Y-%m-%d").strftime("%Y-%m-%d")
        
        prizeAmount = prize.get('prizeAmount', None)  # Prize Amount
        prizeAmountAdjusted = prize.get('prizeAmountAdjusted', None)  # Adjusted Prize Amount
        
        # Extract laureate motivations and IDs
        motivation = ', '.join([laureate.get('motivation', {}).get('en', '') for laureate in prize.get('laureates', [])])
        laureate_ids = [laureate['id'] for laureate in prize.get('laureates', [])]  # Laureate IDs
        number_of_laureates = len(prize.get('laureates', []))  # Number of laureates
        
        # Append the processed data
        processed_data.append({
            'category': category,
            'dateAwarded': dateAwarded,
            'prizeAmount': prizeAmount,
            'prizeAmountAdjusted': prizeAmountAdjusted,
            'Number_of_laureates': number_of_laureates,
            'motivation': motivation,
            'laureate_ids': laureate_ids
        })
    
# Create a Pandas DataFrame from the processed data
df = pd.DataFrame(processed_data)

# Display the first few rows of the DataFrame
display(df.head())

Unnamed: 0,category,dateAwarded,prizeAmount,prizeAmountAdjusted,Number_of_laureates,motivation,laureate_ids
0,Chemistry,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[160]
1,Literature,1901-11-14,150782,10531894,1,in special recognition of his poetic compositi...,[569]
2,Peace,1901-12-10,150782,10531894,2,for his humanitarian efforts to help wounded s...,"[462, 463]"
3,Physics,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[1]
4,Physiology or Medicine,1901-10-30,150782,10531894,1,"for his work on serum therapy, especially its ...",[293]


# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [68]:
# Define the URL to fetch the Nobel Prize data
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

# Fetch the data using requests
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    print("Data fetched successfully!")
    
    # Extract the 'nobelPrizes' data from the response
    prizes_list = response.json()['nobelPrizes']
    
    # List to hold processed data
    processed_data = []
    
    # Iterate over the prize entries
    for prize in prizes_list:
        # Extract required fields
        category = prize['category']['en']  # Category of the prize
        
        # Handle missing 'dateAwarded' by setting it to None if not present
        dateAwarded = prize.get('dateAwarded', None)
        if dateAwarded:
            dateAwarded = datetime.strptime(dateAwarded, "%Y-%m-%d").strftime("%Y-%m-%d")
        
        prizeAmount = prize.get('prizeAmount', None)  # Prize Amount
        prizeAmountAdjusted = prize.get('prizeAmountAdjusted', None)  # Adjusted Prize Amount
        
        # Extract laureate motivations and IDs
        motivation = ', '.join([laureate.get('motivation', {}).get('en', '') for laureate in prize.get('laureates', [])])
        laureate_ids = [laureate['id'] for laureate in prize.get('laureates', [])]  # Laureate IDs
        number_of_laureates = len(prize.get('laureates', []))  # Number of laureates
        
        # Append the processed data
        processed_data.append({
            'category': category,
            'dateAwarded': dateAwarded,
            'prizeAmount': prizeAmount,
            'prizeAmountAdjusted': prizeAmountAdjusted,
            'Number_of_laureates': number_of_laureates,
            'motivation': motivation,
            'laureate_ids': laureate_ids
        })
    
    # Create a Pandas DataFrame from the processed data
    df = pd.DataFrame(processed_data)
    
    # Display the first few rows of the DataFrame
    display(df.head())
    
else:
    print(f"Failed to fetch data. Status code: {response.status_code}")

Data fetched successfully!


Unnamed: 0,category,dateAwarded,prizeAmount,prizeAmountAdjusted,Number_of_laureates,motivation,laureate_ids
0,Chemistry,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[160]
1,Literature,1901-11-14,150782,10531894,1,in special recognition of his poetic compositi...,[569]
2,Peace,1901-12-10,150782,10531894,2,for his humanitarian efforts to help wounded s...,"[462, 463]"
3,Physics,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[1]
4,Physiology or Medicine,1901-10-30,150782,10531894,1,"for his work on serum therapy, especially its ...",[293]


# Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [74]:
import time
from tqdm import tqdm
import requests
import pandas as pd

ids = [int(item) for l in df['laureate_ids'].values for item in l]
unique_ids = set(ids)

# Function definitions
def get_name(laureate):
    if 'fullName' in laureate:  # Person
        return laureate['fullName']['en']
    elif 'orgName' in laureate:  # Organization
        return laureate['orgName']['en']
    else:
        return 'Unknown'

def get_gender(laureate):
    if 'gender' in laureate:
        return laureate['gender']
    else:
        return 'None'

def get_birthdate(laureate):
    return laureate.get('birth', {}).get('date', 'Unknown')

def get_age(laureate, award_year):
    birth_date = laureate.get('birth', {}).get('date', None)
    if birth_date and award_year:
        try:
            birth_year = datetime.strptime(birth_date, "%Y-%m-%d").year
            return int(award_year) - birth_year
        except ValueError:
            return 'Unknown'
    return 'Unknown'

def get_city(laureate):
    return laureate.get('birth', {}).get('place', {}).get('city', {}).get('en', 'Unknown')

def get_country(laureate):
    return laureate.get('birth', {}).get('place', {}).get('country', {}).get('en', 'Unknown')

def get_continent(laureate):
    return laureate.get('birth', {}).get('place', {}).get('continent', {}).get('en', 'Unknown')

def get_latitude(laureate):
    return laureate.get('birth', {}).get('place', {}).get('locationString', {}).get('latitude', 'Unknown')

def get_longitude(laureate):
    return laureate.get('birth', {}).get('place', {}).get('locationString', {}).get('longitude', 'Unknown')

# Dictionary to store laureate information
laureates_dict = {
    "ID": [], "Name": [], "Gender": [], 
    "Birth_date": [], "Age": [], 
    "City": [], "Country": [], "Continent": [], 
    "Latitude": [], "Longitude": []
}

# Loop through unique IDs and make API requests
for laureate_id in tqdm(unique_ids):

    url = f"https://api.nobelprize.org/2/laureate/{laureate_id}"
    response = requests.get(url)

    if response.status_code == 200:
        laureates = response.json()['laureates']  # This is a list

        for laureate in laureates:
            # Process each laureate
            award_year = df[df['laureate_ids'].apply(lambda x: laureate_id in x)]['dateAwarded'].iloc[0][:4]  # Extract award year
            
            # Populate the laureates_dict with values
            laureates_dict["ID"].append(laureate_id)
            laureates_dict["Name"].append(get_name(laureate))
            laureates_dict["Gender"].append(get_gender(laureate))
            laureates_dict["Birth_date"].append(get_birthdate(laureate))
            laureates_dict["Age"].append(get_age(laureate, award_year))
            laureates_dict["City"].append(get_city(laureate))
            laureates_dict["Country"].append(get_country(laureate))
            laureates_dict["Continent"].append(get_continent(laureate))
            laureates_dict["Latitude"].append(get_latitude(laureate))
            laureates_dict["Longitude"].append(get_longitude(laureate))

    # Sleep to avoid overwhelming the API
    time.sleep(0.2)

# Convert the laureates_dict into a Pandas DataFrame
laureates_df = pd.DataFrame(laureates_dict)

# Display the DataFrame
display(laureates_df)

  0%|          | 0/992 [00:00<?, ?it/s]


TypeError: list indices must be integers or slices, not str

In [113]:
import time
from tqdm import tqdm


ids = [int(item) for l in processed_prizes_df['laureate_ids'].values for item in l]
unique_ids = set(ids)

# get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )
def get_name(laureate):
    if 'fullName' in laureate:
        return laureate['fullName']['en']
    elif 'orgName' in laureate:
        return laureate['orgName']['en']
    return "Unknown"

# get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )
def get_gender(laureate):
    if 'gender' in laureate:
        return laureate['gender']
    elif 'orgName' in laureate:
        return "None"
    return "Unknown"
    
# get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )
def get_birthdate(laureate):
    birth_date = "Unknown"
    if 'birth' in laureate and 'date' in laureate['birth']:
        return laureate['birth']['date']
    return "Unknown"

    return birth_date
    
# get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )
def get_age(laureate):
    birth_date = get_birthdate(laureate)
    if birth_date != "Unknown":
        birth_year = int(birth_date.split('-')[0])
        return int(award_year) - birth_year
    return "Unknown"

# get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )
def get_city(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'city' in laureate['birth']['place']:
        return laureate['birth']['place']['city']['en']
    return "Unknown"
    
# get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )
def get_country(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'country' in laureate['birth']['place']:
        return laureate['birth']['place']['country']['en']
    return "Unknown"

# get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )
def get_continent(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'continent' in laureate['birth']['place']:
        return laureate['birth']['place']['continent']['en']
    return "Unknown"
    
# get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )
def get_latitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'locationString' in laureate['birth']['place']:
        return laureate['birth']['place']['locationString'].get('latitude', "Unknown")
    return "Unknown"
    
# get_longitude(laureate) ( it should return the city's longitude when it's available or 'Unknown' otherwise )
def get_longitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth'] and 'locationString' in laureate['birth']['place']:
        return laureate['birth']['place']['locationString'].get('longitude', "Unknown")
    return "Unknown"


laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}

for index, id in enumerate(tqdm(unique_ids)):

    url = "https://api.nobelprize.org/2/laureate/" + str(id)
    response = requests.get(url)

    if response.status_code == 200:

        laureate = response.json()

        if 'laureates' in laureate_data:
            laureate = laureate_data['laureates'][0]
            
            laureates_dict["ID"].append(id)
            laureates_dict["Name"].append(get_name(laureate))
            laureates_dict["Gender"].append(get_gender(laureate))
            laureates_dict["Birth_date"].append(get_birthdate(laureate))
            award_year = df[df['laureate_ids'].apply(lambda x: id in x)]['dateAwarded'].dt.year.values[0]
            laureates_dict["Age"].append(get_age(laureate, award_year))
            laureates_dict["City"].append(get_city(laureate))
            laureates_dict["Country"].append(get_country(laureate))
            laureates_dict["Continent"].append(get_continent(laureate))
            laureates_dict["Latitude"].append(get_latitude(laureate))
            laureates_dict["Longitude"].append(get_longitude(laureate))

laureates_df = pd.DataFrame(laureates_dict)

display(laureates_df.head())


  0%|          | 0/992 [00:00<?, ?it/s]


NameError: name 'laureate_data' is not defined

# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

NameError: name 'laureates_df' is not defined

In [None]:
# Your code here



Unnamed: 0_level_0,ID
Country,Unnamed: 1_level_1
USA,296
United Kingdom,91
Germany,84
France,63
Russia,30
...,...
Greece,1
Ghana,1
Faroe Islands (Denmark),1
Ethiopia,1


In [76]:
# Create an empty dictionary to store the country counts
country_award_count = {}

# Loop through the laureates dataframe to count the countries
for country in laureates_dict["Country"]:
    if country != 'Unknown':  # Ignore unknown countries
        if country in country_award_count:
            country_award_count[country] += 1
        else:
            country_award_count[country] = 1

# Convert the dictionary to a pandas DataFrame for ranking
country_award_df = pd.DataFrame(list(country_award_count.items()), columns=['Country', 'Award_Count'])

# Sort the dataframe by Award_Count in descending order
country_award_df = country_award_df.sort_values(by='Award_Count', ascending=False).reset_index(drop=True)

# Display the top countries
display(country_award_df)

Unnamed: 0,Country,Award_Count
