# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [25]:
import requests
import json
import pandas as pd

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
        
        data = response.json()

        # print(json.dumps(data['nobelPrizes'][0], indent=4))
    
        # List to hold the extracted information for each prize
        prizes_data = []
    
        for prize in data['nobelPrizes']:
            prize_info = {
                "awardYear": prize.get('awardYear', 'Unknown'),
                "category": prize['category'].get('en', 'Unknown'),
                "categoryFullName": prize['categoryFullName'].get('en', 'Unknown'),
                "prizeAmount": prize.get('prizeAmount', 'Unknown'),
                "prizeAmountAdjusted": prize.get('prizeAmountAdjusted', 'Unknown'),
                "dateAwarded": prize.get('dateAwarded', 'Unknown'),
                "laureates": ", ".join([laureate.get('knownName', {}).get('en', laureate.get('orgName', {}).get('en', 'Unknown')) for laureate in prize.get('laureates', [])]),
                "motivation": ", ".join([laureate.get('motivation', {}).get('en', 'Unknown') for laureate in prize.get('laureates', [])]),
                "numberOfLaureates": len(prize.get('laureates', []))
            }
            prizes_data.append(prize_info)
    
        # Create a DataFrame from the list of dictionaries
        prizes_df = pd.DataFrame(prizes_data)

prizes_df
    

All good!


{
    "awardYear": "1901",
    "category": {
        "en": "Chemistry",
        "no": "Kjemi",
        "se": "Kemi"
    },
    "categoryFullName": {
        "en": "The Nobel Prize in Chemistry",
        "no": "Nobelprisen i kjemi",
        "se": "Nobelpriset i kemi"
    },
    "dateAwarded": "1901-11-12",
    "prizeAmount": 150782,
    "prizeAmountAdjusted": 10531894,
    "links": [
        {
            "rel": "nobelPrize",
            "href": "https://api.nobelprize.org/2/nobelPrize/che/1901",
            "action": "GET",
            "types": "application/json"
        }
    ],
    "laureates": [
        {
            "id": "160",
            "knownName": {
                "en": "Jacobus H. van 't Hoff"
            },
            "fullName": {
                "en": "Jacobus Henricus van 't Hoff"
            },
            "portion": "1",
            "sortOrder": "1",
            "motivation": {
                "en": "in recognition of the extraordinary services he has ren

Unnamed: 0,awardYear,category,categoryFullName,prizeAmount,prizeAmountAdjusted,dateAwarded,laureates,motivation,numberOfLaureates
0,1901,Chemistry,The Nobel Prize in Chemistry,150782,10531894,1901-11-12,Jacobus H. van 't Hoff,in recognition of the extraordinary services h...,1
1,1901,Literature,The Nobel Prize in Literature,150782,10531894,1901-11-14,Sully Prudhomme,in special recognition of his poetic compositi...,1
2,1901,Peace,The Nobel Peace Prize,150782,10531894,1901-12-10,"Henry Dunant, Frédéric Passy",for his humanitarian efforts to help wounded s...,2
3,1901,Physics,The Nobel Prize in Physics,150782,10531894,1901-11-12,Wilhelm Conrad Röntgen,in recognition of the extraordinary services h...,1
4,1901,Physiology or Medicine,The Nobel Prize in Physiology or Medicine,150782,10531894,1901-10-30,Emil von Behring,"for his work on serum therapy, especially its ...",1
...,...,...,...,...,...,...,...,...,...
665,2023,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,11000000,11000000,2023-10-09,Claudia Goldin,for having advanced our understanding of women...,1
666,2023,Literature,The Nobel Prize in Literature,11000000,11000000,2023-10-05,Jon Fosse,for his innovative plays and prose which give ...,1
667,2023,Peace,The Nobel Peace Prize,11000000,11000000,2023-10-06,Narges Mohammadi,for her fight against the oppression of women ...,1
668,2023,Physics,The Nobel Prize in Physics,11000000,11000000,2023-10-03,"Pierre Agostini, Ferenc Krausz, Anne L’Huillier",for experimental methods that generate attosec...,3


# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [33]:
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
    prizes_list = response.json()['nobelPrizes']
   
    processed_data = []

    # Loop through each prize entry and extract the required information
    for prize in prizes_list:
        category = prize['category'].get('en', 'Unknown')
        date_awarded = prize.get('dateAwarded', 'Unknown')
        prize_amount = prize.get('prizeAmount', 'Unknown')
        prize_amount_adjusted = prize.get('prizeAmountAdjusted', 'Unknown')
        laureates = prize.get('laureates', [])
        
        # Extract the motivation and laureate IDs
        motivation = ", ".join([laureate.get('motivation', {}).get('en', 'Unknown') for laureate in laureates])
        laureate_ids = [laureate.get('id', 'Unknown') for laureate in laureates]
        
        # Append the processed data to the list
        processed_data.append({
            'category': category,
            'dateAwarded': date_awarded,
            'prizeAmount': prize_amount,
            'prizeAmountAdjusted': prize_amount_adjusted,
            'Number_of_laureates': len(laureates),
            'motivation': motivation,
            'laureate_ids': laureate_ids
        })
    
    df = pd.DataFrame(processed_data)
    
    # Convert the 'dateAwarded' column to DateTime format
    df['dateAwarded'] = pd.to_datetime(df['dateAwarded'], errors='coerce')

df

Unnamed: 0,category,dateAwarded,prizeAmount,prizeAmountAdjusted,Number_of_laureates,motivation,laureate_ids
0,Chemistry,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[160]
1,Literature,1901-11-14,150782,10531894,1,in special recognition of his poetic compositi...,[569]
2,Peace,1901-12-10,150782,10531894,2,for his humanitarian efforts to help wounded s...,"[462, 463]"
3,Physics,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[1]
4,Physiology or Medicine,1901-10-30,150782,10531894,1,"for his work on serum therapy, especially its ...",[293]
...,...,...,...,...,...,...,...
665,Economic Sciences,2023-10-09,11000000,11000000,1,for having advanced our understanding of women...,[1034]
666,Literature,2023-10-05,11000000,11000000,1,for his innovative plays and prose which give ...,[1032]
667,Peace,2023-10-06,11000000,11000000,1,for her fight against the oppression of women ...,[1033]
668,Physics,2023-10-03,11000000,11000000,3,for experimental methods that generate attosec...,"[1026, 1027, 1028]"


# Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [43]:
print(json.dumps(laureate, indent=4))

[
    {
        "id": "1034",
        "knownName": {
            "en": "Claudia Goldin",
            "se": "Claudia Goldin"
        },
        "givenName": {
            "en": "Claudia",
            "se": "Claudia"
        },
        "familyName": {
            "en": "Goldin",
            "se": "Goldin"
        },
        "fullName": {
            "en": "Claudia Goldin",
            "se": "Claudia Goldin"
        },
        "fileName": "goldin",
        "gender": "female",
        "birth": {
            "date": "1946-00-00",
            "place": {
                "city": {
                    "en": "New York, NY",
                    "no": "New York, NY",
                    "se": "New York, NY"
                },
                "country": {
                    "en": "USA",
                    "no": "USA",
                    "se": "USA"
                },
                "cityNow": {
                    "en": "New York, NY",
                    "no": "New York, NY",
                 

In [67]:
import time
from tqdm import tqdm
from datetime import datetime

ids = [int(item) for l in df["laureate_ids"].values for item in l]
unique_ids = set(ids)

def get_name(laureate):
    if "fullName" in laureate:
        return laureate["fullName"].get("en", "Unknown")
    elif "orgName" in laureate:
        return laureate["orgName"].get("en", "Unknown")
    return "Unknown"

def get_gender(laureate):
        return laureate.get("gender", "None")

def get_birthdate(laureate):
    if "birth" in laureate and "date" in laureate["birth"]:
        date = laureate["birth"]["date"]
        if date.count("-") == 2 and not date.endswith("-00-00"):
            return date
    return "Unknown"

def get_age(laureate):
    birth_date = get_birthdate(laureate)
    if birth_date == "Unknown":
        return "Unknown"
    try:
        birth_date = datetime.strptime(birth_date, "%Y-%m-%d")
        award_year = int(laureate.get("nobelPrizes", [{}])[0].get("awardYear", "Unknown"))
        if award_year == "Unknown":
            return "Unknown"
        age = award_year - birth_date.year
        return age
    except ValueError:
        return "Unknown"

def get_city(laureate):
    if "birth" in laureate and "place" in laureate["birth"] and "city" in laureate["birth"]["place"]:
        return laureate["birth"]["place"]["city"]["en"]
    return "Unknown"

def get_country(laureate):
    if "birth" in laureate and "place" in laureate["birth"] and "country" in laureate["birth"]["place"]:
        return laureate["birth"]["place"]["country"].get("en", "Unknown")
    return "Unknown"

def get_continent(laureate):
    if "birth" in laureate and "place" in laureate["birth"] and "continent" in laureate["birth"]["place"]:
        return laureate["birth"]["place"]["continent"].get("en", "Unknown")
    return "Unknown"

def get_latitude(laureate):
    if "birth" in laureate and "place" in laureate["birth"] and "countryNow" in laureate["birth"]["place"]:
        return laureate["birth"]["place"]["countryNow"].get("latitude", "Unknown")
    return "Unknown"

def get_longitude(laureate):
    if "birth" in laureate and "place" in laureate["birth"] and "countryNow" in laureate["birth"]["place"]:
        return laureate["birth"]["place"]["countryNow"].get("longitude", "Unknown")
    return "Unknown"


laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}

for index, id in enumerate(tqdm(unique_ids)):

    url = "https://api.nobelprize.org/2/laureate/" + str(id)
    response = requests.get(url)

    if response.status_code == 200:

        laureate_data = response.json()
        
        if isinstance(laureates, list):
            for laureate in laureate_data:
                laureates_dict['ID'].append(id)
                for key in functions_dict:
                    if key != 'ID': 
                        func = functions_dict[key]
                        laureates_dict[key].append(func(laureate))
        else:
            laureate = laureate_data
            laureates_dict['ID'].append(id)
            for key in functions_dict:
                if key != 'ID': 
                    func = functions_dict[key]
                    laureates_dict[key].append(func(laureate))

laureates_df = pd.DataFrame(laureates_dict)
laureates_df

100%|█████████████████████████████████████████| 992/992 [04:37<00:00,  3.57it/s]


Unnamed: 0,ID,Name,Gender,Birth_date,Age,City,Country,Continent,Latitude,Longitude
0,1,Wilhelm Conrad Röntgen,male,1845-03-27,56,Lennep,Prussia,Europe,51.000000,10.000000
1,2,Hendrik Antoon Lorentz,male,1853-07-18,49,Arnhem,the Netherlands,Europe,52.316667,5.550000
2,3,Pieter Zeeman,male,1865-05-25,37,Zonnemaire,the Netherlands,Europe,52.316667,5.550000
3,4,Antoine Henri Becquerel,male,1852-12-15,51,Paris,France,Europe,47.000000,2.000000
4,5,Pierre Curie,male,1859-05-15,44,Paris,France,Europe,47.000000,2.000000
...,...,...,...,...,...,...,...,...,...,...
987,1030,Louis E. Brus,male,Unknown,Unknown,"Cleveland, OH",USA,North America,39.828175,-98.579500
988,1031,Aleksey Yekimov,male,Unknown,Unknown,Leningrad,USSR,Europe,66.416667,94.250000
989,1032,Jon Fosse,male,1959-09-29,64,Haugesund,Norway,Europe,65.000000,11.000000
990,1033,Narges Mohammadi,female,1972-04-21,51,Zanjan,Iran,Asia,32.000000,53.000000


# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [59]:
filtered_df = laureates_df[laureates_df['Country'] != 'Unknown']

country_award_count = filtered_df['Country'].value_counts()

country_award_df = country_award_count.reset_index()
country_award_df.columns = ['Country', 'Number_of_Awards']

country_award_df = country_award_df.sort_values(by='Number_of_Awards', ascending=False)
country_award_df

Unnamed: 0,Country,Number_of_Awards
0,USA,289
1,United Kingdom,89
2,Germany,80
3,France,58
4,Sweden,30
...,...,...
68,Ethiopia,1
69,Lebanon,1
70,Philippines,1
72,Guadeloupe Island,1
