# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda"s DataFrame.

In [1]:
import requests
import json
import pandas as pd

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)
print(response)

<Response [200]>


In [2]:
if response.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
    
        # Response JSON into a dictionary
        nobel = response.json()
        print(json.dumps(nobel))
        
else:
    print(f"Failed!!!. Status code: {response.status_code}")

All good!


{"nobelPrizes": [{"awardYear": "1901", "category": {"en": "Chemistry", "no": "Kjemi", "se": "Kemi"}, "categoryFullName": {"en": "The Nobel Prize in Chemistry", "no": "Nobelprisen i kjemi", "se": "Nobelpriset i kemi"}, "dateAwarded": "1901-11-12", "prizeAmount": 150782, "prizeAmountAdjusted": 10531894, "links": [{"rel": "nobelPrize", "href": "https://api.nobelprize.org/2/nobelPrize/che/1901", "action": "GET", "types": "application/json"}], "laureates": [{"id": "160", "knownName": {"en": "Jacobus H. van 't Hoff"}, "fullName": {"en": "Jacobus Henricus van 't Hoff"}, "portion": "1", "sortOrder": "1", "motivation": {"en": "in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions", "se": "s\u00e5som ett erk\u00e4nnande av den utomordentliga f\u00f6rtj\u00e4nst han inlagt genom uppt\u00e4ckten av lagarna f\u00f6r den kemiska dynamiken och f\u00f6r det osmotiska trycket i l\u00f6sningar"}, "link

In [3]:
nobel_prizes_list = []

# Iterate over the "nobelPrizes" list 
for prize in nobel["nobelPrizes"]:
    year = prize.get("awardYear", "")
    category = prize["category"].get("en", "")
    category_full_name = prize["categoryFullName"].get("en", "")
    date_awarded = prize.get("dateAwarded", "")
    prize_amount = prize.get("prizeAmount", 0)
    prize_amount_adjusted = prize.get("prizeAmountAdjusted", 0)
    links = prize.get ("links") 

    # Following the example (output) I create a list for laureates information
    laureate_ids = []
    laureate_known_names = []
    laureate_full_names = []
    motivations = []
    shares = []

    # Check if there are laureates and iterate over them
    laureates = prize.get("laureates", [])
    for laureate in laureates:
        laureate_ids.append(laureate.get("id", ""))
        laureate_known_names.append(laureate.get("knownName", {}).get("en", ""))
        laureate_full_names.append(laureate.get("fullName", {}).get("en", ""))
        motivations.append(laureate.get("motivation", {}).get("en", ""))
        shares.append(laureate.get("portion", ""))

          
    # Append to the list
    nobel_prizes_list.append({
        "Year": year,
        "Category": category,
        "Category Full Name": category_full_name,
        "Date Awarded": date_awarded,
        "Prize Amount": prize_amount,
        "Prize Amount Adjusted": prize_amount_adjusted,
        "Links": links,
        "Laureate ID": laureate_ids,
        "Laureate Known Name": laureate_known_names,
        "Laureate Full Name": laureate_full_names,
        "Motivation": motivations,
        "Share": shares
    })

df = pd.DataFrame(nobel_prizes_list)

display(df)


Unnamed: 0,Year,Category,Category Full Name,Date Awarded,Prize Amount,Prize Amount Adjusted,Links,Laureate ID,Laureate Known Name,Laureate Full Name,Motivation,Share
0,1901,Chemistry,The Nobel Prize in Chemistry,1901-11-12,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",[160],[Jacobus H. van 't Hoff],[Jacobus Henricus van 't Hoff],[in recognition of the extraordinary services ...,[1]
1,1901,Literature,The Nobel Prize in Literature,1901-11-14,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",[569],[Sully Prudhomme],[Sully Prudhomme],[in special recognition of his poetic composit...,[1]
2,1901,Peace,The Nobel Peace Prize,1901-12-10,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[462, 463]","[Henry Dunant, Frédéric Passy]","[Jean Henry Dunant, Frédéric Passy]",[for his humanitarian efforts to help wounded ...,"[1/2, 1/2]"
3,1901,Physics,The Nobel Prize in Physics,1901-11-12,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",[1],[Wilhelm Conrad Röntgen],[Wilhelm Conrad Röntgen],[in recognition of the extraordinary services ...,[1]
4,1901,Physiology or Medicine,The Nobel Prize in Physiology or Medicine,1901-10-30,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",[293],[Emil von Behring],[Emil Adolf von Behring],"[for his work on serum therapy, especially its...",[1]
...,...,...,...,...,...,...,...,...,...,...,...,...
665,2023,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,2023-10-09,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",[1034],[Claudia Goldin],[Claudia Goldin],[for having advanced our understanding of wome...,[1]
666,2023,Literature,The Nobel Prize in Literature,2023-10-05,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",[1032],[Jon Fosse],[Jon Fosse],[for his innovative plays and prose which give...,[1]
667,2023,Peace,The Nobel Peace Prize,2023-10-06,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",[1033],[Narges Mohammadi],[Narges Mohammadi],[for her fight against the oppression of women...,[1]
668,2023,Physics,The Nobel Prize in Physics,2023-10-03,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[1026, 1027, 1028]","[Pierre Agostini, Ferenc Krausz, Anne L’Huillier]","[Pierre Agostini, Ferenc Krausz, Anne L’Huillier]",[for experimental methods that generate attose...,"[1/3, 1/3, 1/3]"


# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [4]:
new_df= df [["Category", "Date Awarded", "Prize Amount", "Prize Amount Adjusted", "Motivation", "Laureate ID"]]

new_df["Date Awarded"] = pd.to_datetime(df["Date Awarded"], format="%Y-%m-%d")
new_df["Motivation"]= " | ".join(motivations)
new_df["Number_of_laureates"] = df["Laureate ID"].apply(lambda x: len(x))

display (new_df.head ())  

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["Date Awarded"] = pd.to_datetime(df["Date Awarded"], format="%Y-%m-%d")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["Motivation"]= " | ".join(motivations)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["Number_of_laureates"] = df["Laureate ID"].apply(lambda x: len(x))


Unnamed: 0,Category,Date Awarded,Prize Amount,Prize Amount Adjusted,Motivation,Laureate ID,Number_of_laureates
0,Chemistry,1901-11-12,150782,10531894,for their discoveries concerning nucleoside ba...,[160],1
1,Literature,1901-11-14,150782,10531894,for their discoveries concerning nucleoside ba...,[569],1
2,Peace,1901-12-10,150782,10531894,for their discoveries concerning nucleoside ba...,"[462, 463]",2
3,Physics,1901-11-12,150782,10531894,for their discoveries concerning nucleoside ba...,[1],1
4,Physiology or Medicine,1901-10-30,150782,10531894,for their discoveries concerning nucleoside ba...,[293],1


### Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName" of the individual or "orgName" of the institution )

- get_gender(laureate) ( it should return the gender or "Unknown" for individuals, and "None" for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it"s avaialble or "Unknown" otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or "Unknown' when it"s not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it"s available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [5]:
from tqdm import tqdm

ids = [int(item) for l in df["Laureate ID"].values for item in l]
unique_ids = list(set(ids))
unique_ids.sort()
laureate_data: list[dict] = []

for laureate_id in tqdm(unique_ids):
    url = f"http://api.nobelprize.org/2.1/laureate/"  + str(laureate_id)
    response = requests.get(url)

    if response.status_code == 200:
        laureate_data.extend(response.json())
    else:
        print(f"Id {laureate_id} failed!!!, Status code: {response.status_code}")


100%|██████████| 992/992 [02:21<00:00,  7.03it/s]


In [6]:
def get_name(laureate):

    if "fullName" in laureate:
        return laureate["fullName"].get("en", "Unknown")
    elif "orgName" in laureate:
        return laureate["orgName"].get("en", "Unknown")
    return "Unknown"


def get_gender(laureate):

    if "gender" in laureate:
        return laureate.get("gender", "Unknown")
    elif "orgName" in laureate:
        return "None"  
    return "Unknown"


def get_birthdate(laureate):

    return laureate.get("birth", {}).get("date", "Unknown")


def get_age(laureate):

    birth_date = get_birthdate(laureate)
    award_year = "Unknown"
 
    if birth_date != "Unknown":
        birth_year = int(birth_date.split("-")[0])
        award_year = int(laureate["nobelPrizes"][0]["awardYear"])
        return award_year - birth_year
    return "Unknown"
  

def get_city(laureate):
   
    return laureate.get("birth", {}).get("place", {}).get("city", {}).get("en", "Unknown")


def get_country(laureate):

    return laureate.get("birth", {}).get("place", {}).get("country", {}).get("en", "Unknown")


def get_continent(laureate):

    return laureate.get("birth", {}).get("place", {}).get("continent", {}).get("en", "Unknown")


def get_latitude(laureate):

    return laureate.get("birth", {}).get("place", {}).get("cityNow", {}).get("latitude", "Unknown")


def get_longitude(laureate):

    return laureate.get("birth", {}).get("place", {}).get("cityNow", {}).get("longitude", "Unknown")


laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}

# loop over the lareates list
for laureate in laureate_data:
    laureates_dict["ID"].append(laureate["id"])
    laureates_dict["Name"].append(get_name(laureate))
    laureates_dict["Gender"].append(get_gender(laureate))
    laureates_dict["Birth_date"].append(get_birthdate(laureate))
    laureates_dict["Age"].append(get_age(laureate))
    laureates_dict["City"].append(get_city(laureate))
    laureates_dict["Country"].append(get_country(laureate))
    laureates_dict["Continent"].append(get_continent(laureate))
    laureates_dict["Latitude"].append(get_latitude(laureate))
    laureates_dict["Longitude"].append(get_longitude(laureate))

laureates_df = pd.DataFrame(laureates_dict)

display(laureates_df)


Unnamed: 0,ID,Name,Gender,Birth_date,Age,City,Country,Continent,Latitude,Longitude
0,1,Wilhelm Conrad Röntgen,male,1845-03-27,56,Lennep,Prussia,Europe,51.178742,7.189696
1,2,Hendrik Antoon Lorentz,male,1853-07-18,49,Arnhem,the Netherlands,Europe,51.984257,5.910857
2,3,Pieter Zeeman,male,1865-05-25,37,Zonnemaire,the Netherlands,Europe,51.713056,3.951111
3,4,Antoine Henri Becquerel,male,1852-12-15,51,Paris,France,Europe,48.860093,2.355954
4,5,Pierre Curie,male,1859-05-15,44,Paris,France,Europe,48.860093,2.355954
...,...,...,...,...,...,...,...,...,...,...
987,1030,Louis E. Brus,male,1943-00-00,80,"Cleveland, OH",USA,North America,41.496386,-81.710675
988,1031,Aleksey Yekimov,male,1945-00-00,78,Leningrad,USSR,Europe,59.956651,30.333547
989,1032,Jon Fosse,male,1959-09-29,64,Haugesund,Norway,Europe,59.410150,5.275511
990,1033,Narges Mohammadi,female,1972-04-21,51,Zanjan,Iran,Asia,36.666667,48.483333


# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [7]:
# Your code here
# Count the number of awards per country
country_counts = laureates_df["Country"].value_counts()

country_ranking_df = country_counts.reset_index()
country_ranking_df.columns = ["Country", "Number_of_Awards"]

# Sort by number of awards
country_ranking_df = country_ranking_df.sort_values(by="Number_of_Awards", ascending=False)

display(country_ranking_df)

Unnamed: 0,Country,Number_of_Awards
0,USA,289
1,United Kingdom,89
2,Germany,80
3,France,58
4,Sweden,30
...,...,...
69,Iraq,1
70,Ethiopia,1
71,Lebanon,1
73,Free City of Danzig,1


In [8]:
country_ranking_df.Country.nunique ()

99

In [9]:
country_ranking_df.Country.unique ()

array(['USA', 'United Kingdom', 'Germany', 'France', 'Sweden', 'Japan',
       'Unknown', 'Canada', 'the Netherlands', 'Switzerland', 'Italy',
       'Russia', 'Austria', 'Russian Empire', 'Prussia', 'Norway',
       'Austria-Hungary', 'Denmark', 'Scotland', 'China', 'India',
       'Australia', 'South Africa', 'Belgium', 'Hungary', 'Poland',
       'Spain', 'USSR', 'Egypt', 'Ireland', 'Northern Ireland',
       'British Mandate of Palestine', 'West Germany', 'Romania',
       'Austrian Empire', 'Argentina', 'Mexico', 'New Zealand', 'Korea',
       'Colombia', 'Iran', 'Turkey', 'Luxembourg', 'Guatemala',
       'Schleswig', 'Portugal', 'Liberia', 'East Timor', 'French Algeria',
       'Chile', 'British India', 'Finland', 'Ottoman Empire',
       'Mecklenburg', 'Hesse-Kassel', 'Java, Dutch East Indies',
       'Württemberg', 'Bavaria', 'Brazil', 'German-occupied Poland',
       'Venezuela', 'Southern Rhodesia', 'Vietnam', 'Burma', 'Costa Rica',
       'Tibet', 'Madagascar', 'Tuscany', '

In [10]:
# Define the mapping
country_mapping = {
    "USA": "USA",
    "United Kingdom": "United Kingdom",
    "Germany": "Germany",
    "France": "France",
    "Sweden": "Sweden",
    "Japan": "Japan",
    "Unknown": "Unknown",
    "Canada": "Canada",
    "the Netherlands": "Netherlands",
    "Switzerland": "Switzerland",
    "Italy": "Italy",
    "Russia": "Russia",
    "Austria": "Austria",
    "Russian Empire": "Russia",          
    "Prussia": "Germany",                
    "Norway": "Norway",
    "Austria-Hungary": "Austria",        
    "Denmark": "Denmark",
    "Scotland": "United Kingdom",        
    "China": "China",
    "India": "India",
    "Australia": "Australia",
    "South Africa": "South Africa",
    "Belgium": "Belgium",
    "Hungary": "Hungary",
    "Poland": "Poland",
    "Spain": "Spain",
    "USSR": "Russia",                    
    "Egypt": "Egypt",
    "Ireland": "Ireland",
    "Northern Ireland": "United Kingdom", 
    "British Mandate of Palestine": "Israel",  
    "West Germany": "Germany",           
    "Romania": "Romania",
    "Austrian Empire": "Austria",        
    "Argentina": "Argentina",
    "Mexico": "Mexico",
    "New Zealand": "New Zealand",
    "Korea": "South Korea",              
    "Colombia": "Colombia",
    "Iran": "Iran",
    "Turkey": "Turkey",
    "Luxembourg": "Luxembourg",
    "Guatemala": "Guatemala",
    "Schleswig": "Germany",              
    "Portugal": "Portugal",
    "Liberia": "Liberia",
    "East Timor": "East Timor",
    "French Algeria": "Algeria",         
    "Chile": "Chile",
    "British India": "India",            
    "Finland": "Finland",
    "Ottoman Empire": "Turkey",          
    "Mecklenburg": "Germany",            
    "Hesse-Kassel": "Germany",           
    "Java, Dutch East Indies": "Indonesia",
    "Württemberg": "Germany",            
    "Bavaria": "Germany",                
    "Brazil": "Brazil",
    "German-occupied Poland": "Poland",  
    "Venezuela": "Venezuela",
    "Southern Rhodesia": "Zimbabwe",     
    "Vietnam": "Vietnam",
    "Burma": "Myanmar",                  
    "Costa Rica": "Costa Rica",
    "Tibet": "China",                    
    "Madagascar": "Madagascar",
    "Tuscany": "Italy",                  
    "East Friesland": "Germany",         
    "Iceland": "Iceland",
    "Taiwan": "Taiwan",
    "Bosnia": "Bosnia and Herzegovina",  
    "Crete": "Greece",                   
    "Bulgaria": "Bulgaria",
    "Faroe Islands (Denmark)": "Denmark",
    "Philippines": "Philippines",
    "Nigeria": "Nigeria",
    "Peru": "Peru",
    "British Protectorate of Palestine": "Israel",
    "Kenya": "Kenya",
    "Gold Coast": "Ghana",               
    "Trinidad and Tobago": "Trinidad and Tobago",
    "Czechoslovakia": "Czech Republic",  
    "British West Indies": "Caribbean",  
    "Lithuania": "Lithuania",
    "Cyprus": "Cyprus",
    "Persia": "Iran",                    
    "Saint Lucia": "Saint Lucia",
    "Guadeloupe Island": "France",       
    "Yemen": "Yemen",
    "Morocco": "Morocco",
    "Pakistan": "Pakistan",
    "Ukraine": "Ukraine",
    "Belgian Congo": "Democratic Republic of the Congo", 
    "Iraq": "Iraq",
    "Ethiopia": "Ethiopia",
    "Lebanon": "Lebanon",
    "Free City of Danzig": "Poland",     
    "French protectorate of Tunisia": "Tunisia"  
}

# Apply the mapping to the "country" column
laureates_df["Country"] = laureates_df["Country"].map(country_mapping)

display(laureates_df)
    


Unnamed: 0,ID,Name,Gender,Birth_date,Age,City,Country,Continent,Latitude,Longitude
0,1,Wilhelm Conrad Röntgen,male,1845-03-27,56,Lennep,Germany,Europe,51.178742,7.189696
1,2,Hendrik Antoon Lorentz,male,1853-07-18,49,Arnhem,Netherlands,Europe,51.984257,5.910857
2,3,Pieter Zeeman,male,1865-05-25,37,Zonnemaire,Netherlands,Europe,51.713056,3.951111
3,4,Antoine Henri Becquerel,male,1852-12-15,51,Paris,France,Europe,48.860093,2.355954
4,5,Pierre Curie,male,1859-05-15,44,Paris,France,Europe,48.860093,2.355954
...,...,...,...,...,...,...,...,...,...,...
987,1030,Louis E. Brus,male,1943-00-00,80,"Cleveland, OH",USA,North America,41.496386,-81.710675
988,1031,Aleksey Yekimov,male,1945-00-00,78,Leningrad,Russia,Europe,59.956651,30.333547
989,1032,Jon Fosse,male,1959-09-29,64,Haugesund,Norway,Europe,59.410150,5.275511
990,1033,Narges Mohammadi,female,1972-04-21,51,Zanjan,Iran,Asia,36.666667,48.483333


In [11]:
country_ranking_df.Country.nunique ()


99

In [12]:
# Count the number of awards per country
country_counts = laureates_df["Country"].value_counts()

country_ranking_df = country_counts.reset_index()
country_ranking_df.columns = ["Country", "Number_of_Awards"]

# Sort by number of awards
country_ranking_df = country_ranking_df.sort_values(by="Number_of_Awards", ascending=False)

display(country_ranking_df)

Unnamed: 0,Country,Number_of_Awards
0,USA,289
1,Germany,105
2,United Kingdom,105
3,France,59
4,Russia,41
...,...,...
45,Philippines,1
44,Ukraine,1
43,Pakistan,1
42,Morocco,1
