# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [7]:
import requests
import json
import pandas as pd

url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
        print("All good!")
        print("==============")
        print("\n")
        # Your code here
    
        data = response.json()
        nobel_prizes = data['nobelPrizes']
        df = pd.DataFrame(nobel_prizes)
        print(df.head())
else:
    print(f"Failed to retrieve data: {response.status_code}")


All good!


  awardYear                                           category  \
0      1901   {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}   
1      1901  {'en': 'Literature', 'no': 'Litteratur', 'se':...   
2      1901        {'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}   
3      1901   {'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}   
4      1901  {'en': 'Physiology or Medicine', 'no': 'Fysiol...   

                                    categoryFullName dateAwarded  prizeAmount  \
0  {'en': 'The Nobel Prize in Chemistry', 'no': '...  1901-11-12       150782   
1  {'en': 'The Nobel Prize in Literature', 'no': ...  1901-11-14       150782   
2  {'en': 'The Nobel Peace Prize', 'no': 'Nobels ...  1901-12-10       150782   
3  {'en': 'The Nobel Prize in Physics', 'no': 'No...  1901-11-12       150782   
4  {'en': 'The Nobel Prize in Physiology or Medic...  1901-10-30       150782   

   prizeAmountAdjusted                                              links  \
0             10531894  [{'

# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [10]:
url = "http://api.nobelprize.org/2.1/nobelPrizes?limit=100000"

response = requests.get(url)

if response.status_code == 200:
    prizes_list = response.json()['nobelPrizes']
    # Your code here

    df = pd.DataFrame(prizes_list)
    df['dateAwarded'] = pd.to_datetime(df['dateAwarded']).dt.strftime('%Y-%m-%d')
    df_cleaned = pd.DataFrame({
        'category': df['category'].apply(lambda x: x['en']),
        'dateAwarded': df['dateAwarded'],
        'prizeAmount': df['prizeAmount'],
        'prizeAmountAdjusted': df['prizeAmountAdjusted'],
        'Number_of_laureates': df['laureates'].apply(lambda x: len(x) if isinstance(x, list) else 0),
        'motivation': df['topMotivation'].apply(lambda x: x['en'] if pd.notna(x) else None),
        'laureate_ids': df['laureates'].apply(lambda x: [laureate['id'] for laureate in x] if isinstance(x, list) else [])
    })
    print(df_cleaned.head())
else:
    print(f"Failed to retrieve data: {response.status_code}")

                 category dateAwarded  prizeAmount  prizeAmountAdjusted  \
0               Chemistry  1901-11-12       150782             10531894   
1              Literature  1901-11-14       150782             10531894   
2                   Peace  1901-12-10       150782             10531894   
3                 Physics  1901-11-12       150782             10531894   
4  Physiology or Medicine  1901-10-30       150782             10531894   

   Number_of_laureates motivation laureate_ids  
0                    1       None        [160]  
1                    1       None        [569]  
2                    2       None   [462, 463]  
3                    1       None          [1]  
4                    1       None        [293]  


# Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [13]:
# Check if the column 'laureate_ids' exists
if 'laureate_ids' not in df.columns:
    print("Column 'laureate_ids' does not exist in the DataFrame.")
    print("Available columns are:", df.columns)
else:
    # Proceed with extracting unique IDs if the column exists
    ids = [int(item) for l in df['laureate_ids'].values for item in l]
    unique_ids = set(ids)

Column 'laureate_ids' does not exist in the DataFrame.
Available columns are: Index(['awardYear', 'category', 'categoryFullName', 'dateAwarded',
       'prizeAmount', 'prizeAmountAdjusted', 'links', 'laureates',
       'topMotivation'],
      dtype='object')


In [14]:
# New cell: Create 'laureate_ids' column if it doesn't exist
if 'laureate_ids' not in df.columns:
    # Apply the function only if 'laureates' contains a valid list, otherwise return an empty list
    df['laureate_ids'] = df['laureates'].apply(lambda x: [l['id'] for l in x] if isinstance(x, list) else [])
    print("Created the 'laureate_ids' column.")



Created the 'laureate_ids' column.


In [15]:
import time
from tqdm import tqdm


ids = [int(item) for l in df['laureate_ids'].values for item in l]
unique_ids = set(ids)



def get_name(laureate):
    if 'fullName' in laureate:
        return laureate['fullName']['en']
    elif 'orgName' in laureate:
        return laureate['orgName']
    return "Unknown"

def get_gender(laureate):
    if 'gender' in laureate:
        return laureate['gender']
    return 'None'

def get_birthdate(laureate):
    birth_date = "Unknown"
    if 'birth' in laureate and 'date' in laureate['birth']:
        birth_date = laureate['birth']['date']
    return birth_date

def get_age(laureate):
    birth_date = "Unknown"
    award_date = "Unknown"
    if 'birth' in laureate and 'date' in laureate['birth']:
        birth_date = datetime.strptime(laureate['birth']['date'], '%Y-%m-%d')
    if 'awardYear' in laureate:
        award_date = datetime.strptime(laureate['awardYear'], '%Y')
        if isinstance(birth_date, datetime):
            age = award_date.year - birth_date.year
            return age
    return 'Unknown'

def get_city(laureate):
    if 'birth' in laureate and 'place' in laureate['birth']:
        return laureate['birth']['place'].get('city', 'Unknown')
    return 'Unknown'

def get_country(laureate):
    if 'birth' in laureate and 'place' in laureate['birth']:
        return laureate['birth']['place'].get('country', 'Unknown')
    return 'Unknown'

def get_continent(laureate):
    if 'birth' in laureate and 'place' in laureate['birth']:
        return laureate['birth']['place'].get('continent', 'Unknown')
    return 'Unknown'

def get_latitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth']:
        return laureate['birth']['place'].get('latitude', 'Unknown')
    return 'Unknown'

def get_longitude(laureate):
    if 'birth' in laureate and 'place' in laureate['birth']:
        return laureate['birth']['place'].get('longitude', 'Unknown')
    return 'Unknown'


laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}

for index, id in enumerate(tqdm(unique_ids)):

    url = "https://api.nobelprize.org/2/laureate/" + str(id)
    response = requests.get(url)

    if response.status_code == 200:

        laureate = response.json()
        for key, func in functions_dict.items():
            if key == "ID":
                laureates_dict[key].append(id)
            else:
                laureates_dict[key].append(func(laureate))
                
laureates_df = pd.DataFrame(laureates_dict)

laureates_df



100%|█████████████████████████████████████████| 992/992 [02:13<00:00,  7.44it/s]


Unnamed: 0,ID,Name,Gender,Birth_date,Age,City,Country,Continent,Latitude,Longitude
0,1,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
1,2,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
2,3,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
3,4,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
4,5,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
...,...,...,...,...,...,...,...,...,...,...
987,1030,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
988,1031,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
989,1032,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown
990,1033,Unknown,,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown,Unknown


# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [18]:
# Your code here

country_count = laureates_df['Country'].value_counts()

country_ranking = pd.DataFrame({
    'Country': country_count.index,
    'Number of Prizes': country_count.values
})

country_ranking = country_ranking.sort_values(by='Number of Prizes', ascending=False)

country_ranking


Unnamed: 0,Country,Number of Prizes
0,Unknown,992
