# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [16]:
import requests
import json
import pandas as pd
import time
from tqdm import tqdm
import re

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

try:
    response = requests.get(url)
    response.raise_for_status()
    info = response.json()
    print("Data retrieved successfully!")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")


if response.status_code == 200:
    print("All good!")
    print("==============")
    print("\n")
    info = response.json()
    df = pd.DataFrame(info["nobelPrizes"])
    df= df.dropna(subset=['laureates'])
    
df.head(20)

Data retrieved successfully!
All good!




Unnamed: 0,awardYear,category,categoryFullName,dateAwarded,prizeAmount,prizeAmountAdjusted,links,laureates,topMotivation
0,1901,"{'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}","{'en': 'The Nobel Prize in Chemistry', 'no': '...",1901-11-12,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '160', 'knownName': {'en': 'Jacobus H....",
1,1901,"{'en': 'Literature', 'no': 'Litteratur', 'se':...","{'en': 'The Nobel Prize in Literature', 'no': ...",1901-11-14,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '569', 'knownName': {'en': 'Sully Prud...",
2,1901,"{'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}","{'en': 'The Nobel Peace Prize', 'no': 'Nobels ...",1901-12-10,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '462', 'knownName': {'en': 'Henry Duna...",
3,1901,"{'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}","{'en': 'The Nobel Prize in Physics', 'no': 'No...",1901-11-12,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '1', 'knownName': {'en': 'Wilhelm Conr...",
4,1901,"{'en': 'Physiology or Medicine', 'no': 'Fysiol...",{'en': 'The Nobel Prize in Physiology or Medic...,1901-10-30,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '293', 'knownName': {'en': 'Emil von B...",
5,1902,"{'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}","{'en': 'The Nobel Prize in Chemistry', 'no': '...",1902-11-11,141847,9907798,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '161', 'knownName': {'en': 'Emil Fisch...",
6,1902,"{'en': 'Literature', 'no': 'Litteratur', 'se':...","{'en': 'The Nobel Prize in Literature', 'no': ...",1902-11-13,141847,9907798,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '571', 'knownName': {'en': 'Theodor Mo...",
7,1902,"{'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}","{'en': 'The Nobel Peace Prize', 'no': 'Nobels ...",1902-12-10,141847,9907798,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '464', 'knownName': {'en': 'Élie Ducom...",
8,1902,"{'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}","{'en': 'The Nobel Prize in Physics', 'no': 'No...",1902-11-11,141847,9907798,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '2', 'knownName': {'en': 'Hendrik A. L...",
9,1902,"{'en': 'Physiology or Medicine', 'no': 'Fysiol...",{'en': 'The Nobel Prize in Physiology or Medic...,1902-10-30,141847,9907798,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '294', 'knownName': {'en': 'Ronald Ros...",


In [4]:
df.laureates[2][1]

{'id': '463',
 'knownName': {'en': 'Frédéric Passy'},
 'fullName': {'en': 'Frédéric Passy'},
 'portion': '1/2',
 'sortOrder': '2',
 'motivation': {'en': 'for his lifelong work for international peace conferences, diplomacy and arbitration',
  'no': 'for sin  livslange innsats for internasjonale fredskonferanser, diplomati og voldgift'},
 'links': [{'rel': 'laureate',
   'href': 'https://api.nobelprize.org/2/laureate/463',
   'action': 'GET',
   'types': 'application/json'}]}

In [5]:
for x in df.laureates:
    if int(x[0]["id"]) == 462:
        print(x)

[{'id': '462', 'knownName': {'en': 'Henry Dunant'}, 'fullName': {'en': 'Jean Henry Dunant'}, 'portion': '1/2', 'sortOrder': '1', 'motivation': {'en': 'for his humanitarian efforts to help wounded soldiers and create international understanding', 'no': 'for sin humanitære innsats for å hjelpe sårede soldater og skape internasjonal forståelse'}, 'links': [{'rel': 'laureate', 'href': 'https://api.nobelprize.org/2/laureate/462', 'action': 'GET', 'types': 'application/json'}]}, {'id': '463', 'knownName': {'en': 'Frédéric Passy'}, 'fullName': {'en': 'Frédéric Passy'}, 'portion': '1/2', 'sortOrder': '2', 'motivation': {'en': 'for his lifelong work for international peace conferences, diplomacy and arbitration', 'no': 'for sin  livslange innsats for internasjonale fredskonferanser, diplomati og voldgift'}, 'links': [{'rel': 'laureate', 'href': 'https://api.nobelprize.org/2/laureate/463', 'action': 'GET', 'types': 'application/json'}]}]


# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [33]:
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
    info = response.json()
    df = pd.DataFrame(info["nobelPrizes"])


df.drop(["categoryFullName","awardYear","links"], axis=1, inplace=True)
df["dateAwarded"] = pd.to_datetime(df["dateAwarded"])
df= df.dropna(subset=['laureates'])
df["category"] = df["category"].map(lambda cat: cat["en"])
df["motivation"] = df["laureates"].map(lambda laur: laur[0]["motivation"]["en"])
df["Number_of_laureates"] = df['laureates'].apply(lambda x: len(x))
df["laureate_ids"] = df["laureates"].apply(lambda x: [laur["id"] for laur in x])

final_columns = ["category", 'dateAwarded', 'prizeAmount', 'prizeAmountAdjusted', 'Number_of_laureates', 'motivation', 'laureate_ids']
df = df[final_columns]


In [34]:
df

Unnamed: 0,category,dateAwarded,prizeAmount,prizeAmountAdjusted,Number_of_laureates,motivation,laureate_ids
0,Chemistry,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[160]
1,Literature,1901-11-14,150782,10531894,1,in special recognition of his poetic compositi...,[569]
2,Peace,1901-12-10,150782,10531894,2,for his humanitarian efforts to help wounded s...,"[462, 463]"
3,Physics,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[1]
4,Physiology or Medicine,1901-10-30,150782,10531894,1,"for his work on serum therapy, especially its ...",[293]
...,...,...,...,...,...,...,...
665,Economic Sciences,2023-10-09,11000000,11000000,1,for having advanced our understanding of women...,[1034]
666,Literature,2023-10-05,11000000,11000000,1,for his innovative plays and prose which give ...,[1032]
667,Peace,2023-10-06,11000000,11000000,1,for her fight against the oppression of women ...,[1033]
668,Physics,2023-10-03,11000000,11000000,3,for experimental methods that generate attosec...,"[1026, 1027, 1028]"


### Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [11]:
df

Unnamed: 0,category,dateAwarded,prizeAmount,prizeAmountAdjusted,Number_of_laureates,motivation,laureate_ids
0,Chemistry,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[160]
1,Literature,1901-11-14,150782,10531894,1,in special recognition of his poetic compositi...,[569]
2,Peace,1901-12-10,150782,10531894,2,for his humanitarian efforts to help wounded s...,"[462, 463]"
3,Physics,1901-11-12,150782,10531894,1,in recognition of the extraordinary services h...,[1]
4,Physiology or Medicine,1901-10-30,150782,10531894,1,"for his work on serum therapy, especially its ...",[293]
...,...,...,...,...,...,...,...
665,Economic Sciences,2023-10-09,11000000,11000000,1,for having advanced our understanding of women...,[1034]
666,Literature,2023-10-05,11000000,11000000,1,for his innovative plays and prose which give ...,[1032]
667,Peace,2023-10-06,11000000,11000000,1,for her fight against the oppression of women ...,[1033]
668,Physics,2023-10-03,11000000,11000000,3,for experimental methods that generate attosec...,"[1026, 1027, 1028]"


In [50]:
laureates_dict

{'ID': None,
 'Name': <function __main__.get_name(laureate)>,
 'Gender': <function __main__.get_gender(laureate)>,
 'Birth_date': <function __main__.get_birthdate(laureate)>,
 'Age': <function __main__.get_age(laureate)>,
 'City': <function __main__.get_city(laureate)>,
 'Country': <function __main__.get_country(laureate)>,
 'Continent': <function __main__.get_continent(laureate)>,
 'Latitude': <function __main__.get_latitude(laureate)>,
 'Longitude': <function __main__.get_longitude(laureate)>}

In [79]:

ids = [int(item) for l in df["laureate_ids"].values for item in l]
unique_ids = set(ids)

def get_name(laureate):

    if "fullName" in laureate:
        return laureate["fullName"]["en"]
    elif "orgName" in laureate:
        return laureate["orgName"]["en"]
    return "Unknown"


def get_gender(laureate):

    if "gender" in laureate:
        if laureate["gender"] == "male" or laureate["gender"] == "female":
            return laureate["gender"]
        else:
            return "Unknown"
    elif "orgName" in laureate:
        return "None"
    return "Unknown"

def get_birthdate(laureate):

    birth_date = "Unknown"
    try:
        if "birth" in laureate:
            birth_date = pd.to_datetime(laureate["birth"]["date"])
    except: 
        print(laureate["birth"]["date"])
    return birth_date

def get_age(laureate):

    birth_date = "Unknown"
    award_date = "Unknown"
    try:
        if "birth" in laureate:
            birth_date = pd.to_datetime(laureate["birth"]["date"]).year
            if "nobelPrizes" in laureate:
                award_date = int(laureate["nobelPrizes"][0]["awardYear"]) - int(birth_date)
    except:
        print(laureate["birth"]["date"] ,laureate["nobelPrizes"][0]["awardYear"] )

    return award_date
    

def get_city(laureate):
    
    if "birth" in laureate:
        if "place" in laureate["birth"]:
            if "city" in laureate["birth"]["place"]:
                if "en" in laureate["birth"]["place"]["city"]:
                    return laureate["birth"]["place"]["city"]["en"]
    elif "founded" in laureate:
        if  "place" in laureate["founded"]:
            if "city" in laureate["founded"]["place"]:
                if "en" in laureate["founded"]["place"]["city"]:
                    return laureate["founded"]["place"]["city"]["en"]
    else:
        return "Unknown"
        
 
def get_country(laureate):

    if "birth" in laureate:
        if "place" in laureate["birth"]:
            if "country" in laureate["birth"]["place"]:
                if "en" in laureate["birth"]["place"]["country"]:
                    return laureate["birth"]["place"]["country"]["en"]
    elif "founded" in laureate:
        if  "place" in laureate["founded"]:
            if "country" in laureate["founded"]["place"]:
                if "en" in laureate["founded"]["place"]["country"]:
                    return laureate["founded"]["place"]["country"]["en"]
    else:
        return "Unknown"
        


def get_continent(laureate):

    if "birth" in laureate:
        if "place" in laureate["birth"]:
            if "continent" in laureate["birth"]["place"]:
                if "en" in laureate["birth"]["place"]["continent"]:
                    return laureate["birth"]["place"]["continent"]["en"]
    elif "founded" in laureate:
        if  "place" in laureate["founded"]:
            if "continent" in laureate["founded"]["place"]:
                if "en" in laureate["founded"]["place"]["continent"]:
                    return laureate["founded"]["place"]["continent"]["en"]
    else:
        return "Unknown"

def get_latitude(laureate):

    if "birth" in laureate:
        if "place" in laureate["birth"]:
            if "latitude" in laureate["birth"]["place"]:
                return laureate["birth"]["place"]["latitude"]
    else:
        return "Unknown"

def get_longitude(laureate):

    if "birth" in laureate:
        if "place" in laureate["birth"]:
            if "longitude" in laureate["birth"]["place"]:
                return laureate["birth"]["place"]["longitude"]
    else:
        return "Unknown"


laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}

for index, id in enumerate(tqdm(unique_ids)):

    url = "https://api.nobelprize.org/2/laureate/" + str(id)
    response = requests.get(url)

    if response.status_code == 200:

        laureate = response.json()
        
        for key, func in functions_dict.items():
            if key == "ID":
                laureates_dict[key].append(id)
            else:
                laureates_dict[key].append(func(laureate[0]))
        
laureates_df = pd.DataFrame(laureates_dict)

laureates_df



 52%|█████▏    | 513/992 [01:30<01:19,  6.02it/s]

1898-00-00
1898-00-00 1960


 73%|███████▎  | 724/992 [02:07<00:44,  5.99it/s]

1943-00-00
1943-00-00 2001


 82%|████████▏ | 817/992 [02:25<00:32,  5.31it/s]

1952-00-00
1952-00-00 2009


 84%|████████▍ | 838/992 [02:29<00:34,  4.49it/s]

1959-00-00
1959-00-00 2011


 93%|█████████▎| 927/992 [02:44<00:10,  6.05it/s]

1993-00-00
1993-00-00 2018


 94%|█████████▎| 929/992 [02:44<00:10,  6.08it/s]

1955-00-00
1955-00-00 2018


 95%|█████████▌| 946/992 [02:47<00:09,  5.06it/s]

1949-00-00
1949-00-00 2020


 97%|█████████▋| 958/992 [02:50<00:06,  5.52it/s]

1967-00-00
1967-00-00 2021


 97%|█████████▋| 963/992 [02:51<00:06,  4.78it/s]

1948-00-00
1948-00-00 2021


 97%|█████████▋| 966/992 [02:51<00:04,  5.78it/s]

1961-00-00
1961-00-00 2021
1956-00-00
1956-00-00 2021


 98%|█████████▊| 974/992 [02:53<00:03,  4.98it/s]

1954-00-00
1954-00-00 2022


100%|█████████▉| 988/992 [02:55<00:00,  5.90it/s]

1961-00-00
1961-00-00 2023
1943-00-00
1943-00-00 2023


100%|█████████▉| 990/992 [02:56<00:00,  6.26it/s]

1945-00-00
1945-00-00 2023


100%|██████████| 992/992 [02:56<00:00,  5.62it/s]

1946-00-00
1946-00-00 2023





Unnamed: 0,ID,Name,Gender,Birth_date,Age,City,Country,Continent,Latitude,Longitude
0,1,Wilhelm Conrad Röntgen,male,1845-03-27 00:00:00,56,Lennep,Prussia,Europe,,
1,2,Hendrik Antoon Lorentz,male,1853-07-18 00:00:00,49,Arnhem,the Netherlands,Europe,,
2,3,Pieter Zeeman,male,1865-05-25 00:00:00,37,Zonnemaire,the Netherlands,Europe,,
3,4,Antoine Henri Becquerel,male,1852-12-15 00:00:00,51,Paris,France,Europe,,
4,5,Pierre Curie,male,1859-05-15 00:00:00,44,Paris,France,Europe,,
...,...,...,...,...,...,...,...,...,...,...
987,1030,Louis E. Brus,male,Unknown,Unknown,"Cleveland, OH",USA,North America,,
988,1031,Aleksey Yekimov,male,Unknown,Unknown,Leningrad,USSR,Europe,,
989,1032,Jon Fosse,male,1959-09-29 00:00:00,64,Haugesund,Norway,Europe,,
990,1033,Narges Mohammadi,female,1972-04-21 00:00:00,51,Zanjan,Iran,Asia,,


In [86]:
url = "https://api.nobelprize.org/2/laureate/" + str(10)
response = requests.get(url)
laureate = response.json()
laureate[0]

if "birth" in laureate[0]:
    print("a")
    if "place" in laureate[0]["birth"]:
        print("b")

laureate[0]["birth"]["place"]["cityNow"]

a
b


{'en': 'Cheetham Hill',
 'no': 'Cheetham Hill',
 'se': 'Cheetham Hill',
 'sameAs': ['https://www.wikidata.org/wiki/Q2015708',
  'https://www.wikipedia.org/wiki/Cheetham,_Manchester'],
 'latitude': '53.502515',
 'longitude': '-2.235535'}

# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [91]:
# Your code here

laureates_df.pivot_table(index="Country", values ="ID",aggfunc ="count").sort_values(by=['ID'], ascending=False)

Unnamed: 0_level_0,ID
Country,Unnamed: 1_level_1
USA,296
United Kingdom,91
Germany,80
France,60
Sweden,30
...,...
Crete,1
Pakistan,1
French protectorate of Tunisia,1
Peru,1
