# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [3]:
import requests
import json
import pandas as pd

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
        # Your code here

All good!


There are 670 nobel prize awards


Unnamed: 0,awardYear,dateAwarded,prizeAmount,prizeAmountAdjusted,links,laureates,category.en,category.no,category.se,categoryFullName.en,categoryFullName.no,categoryFullName.se,topMotivation.en,topMotivation.se
0,1901,1901-11-12,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '160', 'knownName': {'en': 'Jacobus H....",Chemistry,Kjemi,Kemi,The Nobel Prize in Chemistry,Nobelprisen i kjemi,Nobelpriset i kemi,,
1,1901,1901-11-14,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '569', 'knownName': {'en': 'Sully Prud...",Literature,Litteratur,Litteratur,The Nobel Prize in Literature,Nobelprisen i litteratur,Nobelpriset i litteratur,,
2,1901,1901-12-10,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '462', 'knownName': {'en': 'Henry Duna...",Peace,Fred,Fred,The Nobel Peace Prize,Nobels fredspris,Nobels fredspris,,
3,1901,1901-11-12,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '1', 'knownName': {'en': 'Wilhelm Conr...",Physics,Fysikk,Fysik,The Nobel Prize in Physics,Nobelprisen i fysikk,Nobelpriset i fysik,,
4,1901,1901-10-30,150782,10531894,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '293', 'knownName': {'en': 'Emil von B...",Physiology or Medicine,Fysiologi eller medisin,Fysiologi eller medicin,The Nobel Prize in Physiology or Medicine,Nobelprisen i fysiologi eller medisin,Nobelpriset i fysiologi eller medicin,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
665,2023,2023-10-09,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '1034', 'knownName': {'en': 'Claudia G...",Economic Sciences,Økonomi,Ekonomi,The Sveriges Riksbank Prize in Economic Scienc...,Sveriges Riksbanks pris i økonomisk vitenskap ...,Sveriges Riksbanks pris i ekonomisk vetenskap ...,,
666,2023,2023-10-05,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '1032', 'knownName': {'en': 'Jon Fosse...",Literature,Litteratur,Litteratur,The Nobel Prize in Literature,Nobelprisen i litteratur,Nobelpriset i litteratur,,
667,2023,2023-10-06,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '1033', 'knownName': {'en': 'Narges Mo...",Peace,Fred,Fred,The Nobel Peace Prize,Nobels fredspris,Nobels fredspris,,
668,2023,2023-10-03,11000000,11000000,"[{'rel': 'nobelPrize', 'href': 'https://api.no...","[{'id': '1026', 'knownName': {'en': 'Pierre Ag...",Physics,Fysikk,Fysik,The Nobel Prize in Physics,Nobelprisen i fysikk,Nobelpriset i fysik,,


# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [None]:
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
    prizes_list = response.json()['nobelPrizes']
    # Your code here


Unnamed: 0,category,dateAwarded,prizeAmount,prizeAmountAdjusted,Number_of_laureates,motivation,laureate_ids
0,Chemistry,1901-11-12,150782,10531894,1.0,in recognition of the extraordinary services h...,[160]
1,Literature,1901-11-14,150782,10531894,1.0,in special recognition of his poetic compositi...,[569]
2,Peace,1901-12-10,150782,10531894,2.0,for his humanitarian efforts to help wounded s...,"[462, 463]"
3,Physics,1901-11-12,150782,10531894,1.0,in recognition of the extraordinary services h...,[1]
4,Physiology or Medicine,1901-10-30,150782,10531894,1.0,"for his work on serum therapy, especially its ...",[293]


In [14]:
df.columns

Index(['awardYear', 'dateAwarded', 'prizeAmount', 'prizeAmountAdjusted',
       'links', 'laureates', 'category.en', 'category.no', 'category.se',
       'categoryFullName.en', 'categoryFullName.no', 'categoryFullName.se',
       'topMotivation.en', 'topMotivation.se'],
      dtype='object')

In [13]:
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"
df = pd.json_normalize(prizes_list)
response = requests.get(url)

df_filtered = df_filtered.copy()

if response.status_code == 200:
    prizes_list = response.json()['nobelPrizes']
    # Your code here
    df_filtered = df[['category.en', 'dateAwarded', 'prizeAmount', 'prizeAmountAdjusted', 'laureates', 'topMotivation.en']]
    df_filtered.columns = ['category', 'dateAwarded', 'prizeAmount', 'prizeAmountAdjusted', 'laureates', 'motivation']
    df_filtered.loc[:, 'dateAwarded'] = pd.to_datetime(df_filtered['dateAwarded']).dt.strftime('%Y-%m-%d')
    # Calculate the number of laureates
    df_filtered.loc[:, 'Number_of_laureates'] = df_filtered['laureates'].apply(lambda x: len(x) if isinstance(x, list) else 0)
    
    # Extract laureate IDs into a list
    df_filtered.loc[:, 'laureate_ids'] = df_filtered['laureates'].apply(lambda x: [laureate['id'] for laureate in x] if isinstance(x, list) else [])
    

    # Drop the original 'laureates' column
    df_filtered = df_filtered.drop(columns=['laureates'])
    # Display the processed DataFrame
    df_filtered.head()
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered.loc[:, 'Number_of_laureates'] = df_filtered['laureates'].apply(lambda x: len(x) if isinstance(x, list) else 0)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_filtered.loc[:, 'laureate_ids'] = df_filtered['laureates'].apply(lambda x: [laureate['id'] for laureate in x] if isinstance(x, list) else [])


# Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [None]:
import time
from tqdm import tqdm


ids = [int(item) for l in df['laureate_ids'].values for item in l]
unique_ids = set(ids)

def get_name(laureate):

    # Person
    # Your code here

    # Organization
    # Your code here


def get_gender(laureate):

    # Person
    # Your code here

    # Organizrtion
    # Your code here

def get_birthdate(laureate):

    birth_date = "Unknown"

    # Person
    # Your code here

    # Organization
    # Yuor code here

    return birth_date

def get_age(laureate):

    birth_date = "Unknown"
    award_date = "Unknown"

    # Person
    # Your code here

    # Organization
    # Yuor code here

    # Award date
    # Your code here

def get_city(laureate):

    # Person
    # Your code here

    # Organization
    # Your code here

def get_country(laureate):

    # Person
    # Your code here

    # Origanization
    # Your code here


def get_continent(laureate):

    # Person
    # Your code here

    # Organization
    # Your code here

def get_latitude(laureate):

    # Person
    # Your code here

    # Organization
    # Your code here

def get_longitude(laureate):

    # Person
    # Your code here

    # Organization
    # Yuor code here


laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}

for index, id in enumerate(tqdm(unique_ids)):

    url = "https://api.nobelprize.org/2/laureate/" + str(id)
    response = requests.get(url)

    if response.status_code == 200:

        laureate = response.json()

        # Your code here

laureates_df = pd.DataFrame(laureates_dict)

laureates_df



100%|██████████| 992/992 [10:16<00:00,  1.61it/s]


Unnamed: 0,ID,Name,Gender,Birth_date,Age,City,Country,Continent,Latitude,Longitude
0,1,Wilhelm Conrad Röntgen,male,1845-03-27 00:00:00,,Lennep,Germany,Europe,51.178742,7.189696
1,2,Hendrik Antoon Lorentz,male,1853-07-18 00:00:00,,Arnhem,the Netherlands,Europe,51.984257,5.910857
2,3,Pieter Zeeman,male,1865-05-25 00:00:00,,Zonnemaire,the Netherlands,Europe,51.713056,3.951111
3,4,Antoine Henri Becquerel,male,1852-12-15 00:00:00,,Paris,France,Europe,48.860093,2.355954
4,5,Pierre Curie,male,1859-05-15 00:00:00,,Paris,France,Europe,48.860093,2.355954
...,...,...,...,...,...,...,...,...,...,...
987,1030,Louis E. Brus,male,Unknown,,"Cleveland, OH",USA,North America,41.496386,-81.710675
988,1031,Aleksey Yekimov,male,Unknown,,Leningrad,Russia,Europe,59.956651,30.333547
989,1032,Jon Fosse,male,1959-09-29 00:00:00,,Haugesund,Norway,Europe,59.410150,5.275511
990,1033,Narges Mohammadi,female,1972-04-21 00:00:00,,Zanjan,Iran,Asia,36.666667,48.483333


# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [None]:
# Your code here

Unnamed: 0_level_0,ID
Country,Unnamed: 1_level_1
USA,296
United Kingdom,91
Germany,84
France,63
Russia,30
...,...
Greece,1
Ghana,1
Faroe Islands (Denmark),1
Ethiopia,1
