# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [1]:
import requests
import json
import pandas as pd

In [1]:
#Initial code
#url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

#response = requests.get(url)

#if response.status_code == 200:
#        print("All good!")
#       print("==============")
#        print("\n")
#       # Your code here

All good!




In [2]:
#Answer
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
        
        # Parse JSON response
        data = response.json()
        
        # Extract relevant data (nobelPrizes here)
        nobel_prizes = data.get('nobelPrizes', [])
        
        # Convert to DataFrame
        nobel_prizes_df = pd.DataFrame(nobel_prizes)
        
        # Check first lines of df
        print(nobel_prizes_df.head())

All good!


  awardYear                                           category  \
0      1901   {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}   
1      1901  {'en': 'Literature', 'no': 'Litteratur', 'se':...   
2      1901        {'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}   
3      1901   {'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}   
4      1901  {'en': 'Physiology or Medicine', 'no': 'Fysiol...   

                                    categoryFullName dateAwarded  prizeAmount  \
0  {'en': 'The Nobel Prize in Chemistry', 'no': '...  1901-11-12       150782   
1  {'en': 'The Nobel Prize in Literature', 'no': ...  1901-11-14       150782   
2  {'en': 'The Nobel Peace Prize', 'no': 'Nobels ...  1901-12-10       150782   
3  {'en': 'The Nobel Prize in Physics', 'no': 'No...  1901-11-12       150782   
4  {'en': 'The Nobel Prize in Physiology or Medic...  1901-10-30       150782   

   prizeAmountAdjusted                                              links  \
0             10531894  [{'

In [10]:
#checking
#json
print(response.content)
#df
print(nobel_prizes_df.info())
print(nobel_prizes_df.columns)

b'{"nobelPrizes":[{"awardYear":"1901","category":{"en":"Chemistry","no":"Kjemi","se":"Kemi"},"categoryFullName":{"en":"The Nobel Prize in Chemistry","no":"Nobelprisen i kjemi","se":"Nobelpriset i kemi"},"dateAwarded":"1901-11-12","prizeAmount":150782,"prizeAmountAdjusted":10531894,"links":[{"rel":"nobelPrize","href":"https://api.nobelprize.org/2/nobelPrize/che/1901","action":"GET","types":"application/json"}],"laureates":[{"id":"160","knownName":{"en":"Jacobus H. van \'t Hoff"},"fullName":{"en":"Jacobus Henricus van \'t Hoff"},"portion":"1","sortOrder":"1","motivation":{"en":"in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions","se":"s\xc3\xa5som ett erk\xc3\xa4nnande av den utomordentliga f\xc3\xb6rtj\xc3\xa4nst han inlagt genom uppt\xc3\xa4ckten av lagarna f\xc3\xb6r den kemiska dynamiken och f\xc3\xb6r det osmotiska trycket i l\xc3\xb6sningar"},"links":[{"rel":"laureate","href":"https://api.

# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [13]:
nobel_prizes_df.dtypes

awardYear              object
category               object
categoryFullName       object
dateAwarded            object
prizeAmount             int64
prizeAmountAdjusted     int64
links                  object
laureates              object
topMotivation          object
dtype: object

In [16]:
nobel_prizes_df.describe(include='all')

Unnamed: 0,awardYear,category,categoryFullName,dateAwarded,prizeAmount,prizeAmountAdjusted,links,laureates,topMotivation
count,670.0,670,670,628,670.0,670.0,670,621,58
unique,123.0,6,6,481,,,670,621,12
top,2023.0,"{'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}","{'en': 'The Nobel Prize in Chemistry', 'no': '...",1926-11-11,,,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '160', 'knownName': {'en': 'Jacobus H....",{'en': 'No Nobel Prize was awarded this year. ...
freq,6.0,123,123,5,,,1,1,24
mean,,,,,2939398.0,7125483.0,,,
std,,,,,3918669.0,3795054.0,,,
min,,,,,114935.0,2922455.0,,,
25%,,,,,148822.0,3673307.0,,,
50%,,,,,335000.0,5754417.0,,,
75%,,,,,7350000.0,10630280.0,,,


In [25]:
#checking
display(nobel_prizes_df2['laureates'].iloc[0])
display(nobel_prizes_df2['laureates'].iloc[1])

[{'id': '160',
  'knownName': {'en': "Jacobus H. van 't Hoff"},
  'fullName': {'en': "Jacobus Henricus van 't Hoff"},
  'portion': '1',
  'sortOrder': '1',
  'motivation': {'en': 'in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions',
   'se': 'såsom ett erkännande av den utomordentliga förtjänst han inlagt genom upptäckten av lagarna för den kemiska dynamiken och för det osmotiska trycket i lösningar'},
  'links': [{'rel': 'laureate',
    'href': 'https://api.nobelprize.org/2/laureate/160',
    'action': 'GET',
    'types': 'application/json'}]}]

[{'id': '569',
  'knownName': {'en': 'Sully Prudhomme'},
  'fullName': {'en': 'Sully Prudhomme'},
  'portion': '1',
  'sortOrder': '1',
  'motivation': {'en': 'in special recognition of his poetic composition, which gives evidence of lofty idealism, artistic perfection and a rare combination of the qualities of both heart and intellect',
   'se': 'såsom ett erkännande av hans utmärkta, jämväl under senare år ådagalagda förtjänster som författare och särskilt av hans om hög idealitet, konstnärlig fulländning samt sällspord förening av hjärtats och snillets egenskaper vittnande diktning'},
  'links': [{'rel': 'laureate',
    'href': 'https://api.nobelprize.org/2/laureate/569',
    'action': 'GET',
    'types': 'application/json'}]}]

In [11]:

#Answer
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
        
        # Parse JSON response
        data = response.json()
        
        # Extract relevant data (nobelPrizes here)
        nobel_prizes = data.get('nobelPrizes', [])
        
        # Convert to DataFrame
        nobel_prizes_df = pd.DataFrame(nobel_prizes)

        #Keep only required columns
        nobel_prizes_df2 = nobel_prizes_df.drop(columns=['links'])

        #Checking/changing format of column dateAwarded
        nobel_prizes_df2['dateAwarded'] = pd.to_datetime(nobel_prizes_df2['dateAwarded']).dt.strftime('%Y-%m-%d')

        #Adding the column laureate_ids
        def extract_ids(laureates_list):
            if isinstance(laureates_list, list):  # Check if the item is a list
                return ','.join([str(laureate['id']) for laureate in laureates_list])
            else:
                return ''  # Return empty string if it's not a list

        # Apply the function to the DataFrame
        nobel_prizes_df2['laureate_id'] = nobel_prizes_df2['laureates'].apply(extract_ids)


        # Check first lines of df
        display(nobel_prizes_df2.head())

All good!




Unnamed: 0,awardYear,category,categoryFullName,dateAwarded,prizeAmount,prizeAmountAdjusted,laureates,topMotivation,laureate_id
0,1901,"{'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}","{'en': 'The Nobel Prize in Chemistry', 'no': '...",1901-11-12,150782,10531894,"[{'id': '160', 'knownName': {'en': 'Jacobus H....",,160
1,1901,"{'en': 'Literature', 'no': 'Litteratur', 'se':...","{'en': 'The Nobel Prize in Literature', 'no': ...",1901-11-14,150782,10531894,"[{'id': '569', 'knownName': {'en': 'Sully Prud...",,569
2,1901,"{'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}","{'en': 'The Nobel Peace Prize', 'no': 'Nobels ...",1901-12-10,150782,10531894,"[{'id': '462', 'knownName': {'en': 'Henry Duna...",,462463
3,1901,"{'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}","{'en': 'The Nobel Prize in Physics', 'no': 'No...",1901-11-12,150782,10531894,"[{'id': '1', 'knownName': {'en': 'Wilhelm Conr...",,1
4,1901,"{'en': 'Physiology or Medicine', 'no': 'Fysiol...",{'en': 'The Nobel Prize in Physiology or Medic...,1901-10-30,150782,10531894,"[{'id': '293', 'knownName': {'en': 'Emil von B...",,293


# Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [10]:
#Answer
unique_ids = nobel_prizes_df2['laureate_id'].unique().tolist()
url2 = "http://api.nobelprize.org/2.1/laureates?limit=100000"

response2 = requests.get(url2)

if response2.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
        
        # Parse JSON response
        data2 = response2.json()
        response.headers
        # Extract relevant data (laureates)
        laureates = data2.get('laureates', [])
        
        # Convert to DataFrame
        laureates_df = pd.DataFrame(laureates)
        
        # Check first lines of df
        print(laureates_df.head())

All good!


     id                                          knownName  \
0   745  {'en': 'A. Michael Spence', 'se': 'A. Michael ...   
1   102       {'en': 'Aage N. Bohr', 'se': 'Aage N. Bohr'}   
2   779  {'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...   
3   259           {'en': 'Aaron Klug', 'se': 'Aaron Klug'}   
4  1004  {'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...   

                                  givenName  \
0  {'en': 'A. Michael', 'se': 'A. Michael'}   
1        {'en': 'Aage N.', 'se': 'Aage N.'}   
2            {'en': 'Aaron', 'se': 'Aaron'}   
3            {'en': 'Aaron', 'se': 'Aaron'}   
4  {'en': 'Abdulrazak', 'se': 'Abdulrazak'}   

                                   familyName  \
0            {'en': 'Spence', 'se': 'Spence'}   
1                {'en': 'Bohr', 'se': 'Bohr'}   
2  {'en': 'Ciechanover', 'se': 'Ciechanover'}   
3                {'en': 'Klug', 'se': 'Klug'}   
4            {'en': 'Gurnah', 'se': 'Gurnah'}   

                                     

In [9]:
laureates_df.columns

Index(['id', 'knownName', 'givenName', 'familyName', 'fullName', 'fileName',
       'gender', 'birth', 'wikipedia', 'wikidata', 'sameAs', 'links',
       'nobelPrizes', 'death', 'orgName', 'acronym', 'founded', 'nativeName',
       'penName', 'penNameOf', 'foundedCountry', 'foundedCountryNow',
       'foundedContinent'],
      dtype='object')

In [13]:
for column in laureates_df.columns:
    print(f"Column: {column}, First row value: {laureates_df[column].iloc[0]}")

Column: id, First row value: 745
Column: knownName, First row value: {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'}
Column: givenName, First row value: {'en': 'A. Michael', 'se': 'A. Michael'}
Column: familyName, First row value: {'en': 'Spence', 'se': 'Spence'}
Column: fullName, First row value: {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'}
Column: fileName, First row value: spence
Column: gender, First row value: male
Column: birth, First row value: {'date': '1943-00-00', 'place': {'city': {'en': 'Montclair, NJ', 'no': 'Montclair, NJ', 'se': 'Montclair, NJ'}, 'country': {'en': 'USA', 'no': 'USA', 'se': 'USA'}, 'cityNow': {'en': 'Montclair, NJ', 'no': 'Montclair, NJ', 'se': 'Montclair, NJ', 'sameAs': ['https://www.wikidata.org/wiki/Q678437', 'https://www.wikipedia.org/wiki/Montclair,_New_Jersey'], 'latitude': '40.825930', 'longitude': '-74.209030'}, 'countryNow': {'en': 'USA', 'no': 'USA', 'se': 'USA', 'sameAs': ['https://www.wikidata.org/wiki/Q30'], 'latitude': '39.8

In [None]:
#checking
display(nobel_prizes_df2['laureates'].iloc[0])
display(nobel_prizes_df2['laureates'].iloc[1])

In [None]:
def extract_ids(laureates_list):
        if isinstance(laureates_list, list):  # Check if the item is a list
            return ','.join([str(laureate['id']) for laureate in laureates_list])
        else:
            return ''  # Return empty string if it's not a list

# Apply the function to the DataFrame
nobel_prizes_df2['laureate_id'] = nobel_prizes_df2['laureates'].apply(extract_ids)




In [None]:
import time
from tqdm import tqdm


ids = [int(item) for l in df['laureate_ids'].values for item in l]
unique_ids = set(ids)

def get_name(laureate):
    for laureates_df['id']== laureate
    laureate_row = laureates_df[laureates_df['id'] == laureate_id].iloc[0]
    if pd.notna(laureate_row['fullName']):
        return laureate_row['fullName']
    else:
        return laureate_row['orgName']

def get_gender(laureate):
    for laureates_df['id']== laureate
    laureate_row = laureates_df[laureates_df['id'] == laureate_id].iloc[0]    
    if pd.notna(laureate['gender']):
        return laureate_row['gender']
    elif 'fullName' in laureate and pd.notnull(laureate['fullName']):
        return 'Unknown'
    else:
        return 'None'


def get_birthdate(laureate):
    for laureates_df['id']== laureate
    laureate_row = laureates_df[laureates_df['id'] == laureate_id].iloc[0] 
    

    # Person
    # Your code here

    # Organization
    # Yuor code here

    return birth_date

def get_age(laureate):

    birth_date = "Unknown"
    award_date = "Unknown"

    # Person
    # Your code here

    # Organization
    # Yuor code here

    # Award date
    # Your code here

def get_city(laureate):

    # Person
    # Your code here

    # Organization
    # Your code here

def get_country(laureate):

    # Person
    # Your code here

    # Origanization
    # Your code here


def get_continent(laureate):

    # Person
    # Your code here

    # Organization
    # Your code here

def get_latitude(laureate):

    # Person
    # Your code here

    # Organization
    # Your code here

def get_longitude(laureate):

    # Person
    # Your code here

    # Organization
    # Yuor code here


laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}

for index, id in enumerate(tqdm(unique_ids)):

    url = "https://api.nobelprize.org/2/laureate/" + str(id)
    response = requests.get(url)

    if response.status_code == 200:

        laureate = response.json()

        # Your code here

laureates_df = pd.DataFrame(laureates_dict)

laureates_df



100%|██████████| 992/992 [10:16<00:00,  1.61it/s]


Unnamed: 0,ID,Name,Gender,Birth_date,Age,City,Country,Continent,Latitude,Longitude
0,1,Wilhelm Conrad Röntgen,male,1845-03-27 00:00:00,,Lennep,Germany,Europe,51.178742,7.189696
1,2,Hendrik Antoon Lorentz,male,1853-07-18 00:00:00,,Arnhem,the Netherlands,Europe,51.984257,5.910857
2,3,Pieter Zeeman,male,1865-05-25 00:00:00,,Zonnemaire,the Netherlands,Europe,51.713056,3.951111
3,4,Antoine Henri Becquerel,male,1852-12-15 00:00:00,,Paris,France,Europe,48.860093,2.355954
4,5,Pierre Curie,male,1859-05-15 00:00:00,,Paris,France,Europe,48.860093,2.355954
...,...,...,...,...,...,...,...,...,...,...
987,1030,Louis E. Brus,male,Unknown,,"Cleveland, OH",USA,North America,41.496386,-81.710675
988,1031,Aleksey Yekimov,male,Unknown,,Leningrad,Russia,Europe,59.956651,30.333547
989,1032,Jon Fosse,male,1959-09-29 00:00:00,,Haugesund,Norway,Europe,59.410150,5.275511
990,1033,Narges Mohammadi,female,1972-04-21 00:00:00,,Zanjan,Iran,Asia,36.666667,48.483333


# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [None]:
# Your code here

Unnamed: 0_level_0,ID
Country,Unnamed: 1_level_1
USA,296
United Kingdom,91
Germany,84
France,63
Russia,30
...,...
Greece,1
Ghana,1
Faroe Islands (Denmark),1
Ethiopia,1
