# Lab | APIS

In this lab, you will collect historical data about the Nobel Prize winners using [this free and non-authenticated API](https://www.nobelprize.org/organization/developer-zone-2/). According to the documentation available [here](https://app.swaggerhub.com/apis/NobelMedia/NobelMasterData/2.1#/default/get_nobelPrizes). The base url is: "http://api.nobelprize.org/2.1/" followed by a string to specify what kind of information do you want to retrieve. The acceptable options are:

* nobelPrices
* nobelPrice/category/year
* laureates
* laureate/laureateID

# Getting the information using requests

Use the Python `requests`, and `json` libraries to obtain the information of ALL the Nobel Prizes. Make sure to verify that you get the proper status code (200).

The json outputs are simple plain text that need to be converted into the corresponding nested dictionary. Use the `.json()` method to cast the output into a Python dictionary.

Use the Pandas library to collect all the information into a Panda's DataFrame.

In [18]:
import requests
import json
import pandas as pd

url = "https://api.nobelprize.org/2.1/nobelPrizes"

response = requests.get(url)



In [19]:
response.status_code == 200

True

In [20]:
data = response.json()
data

{'nobelPrizes': [{'awardYear': '1901',
   'category': {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'},
   'categoryFullName': {'en': 'The Nobel Prize in Chemistry',
    'no': 'Nobelprisen i kjemi',
    'se': 'Nobelpriset i kemi'},
   'dateAwarded': '1901-11-12',
   'prizeAmount': 150782,
   'prizeAmountAdjusted': 10531894,
   'links': [{'rel': 'nobelPrize',
     'href': 'https://api.nobelprize.org/2/nobelPrize/che/1901',
     'action': 'GET',
     'types': 'application/json'}],
   'laureates': [{'id': '160',
     'knownName': {'en': "Jacobus H. van 't Hoff"},
     'fullName': {'en': "Jacobus Henricus van 't Hoff"},
     'portion': '1',
     'sortOrder': '1',
     'motivation': {'en': 'in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions',
      'se': 'såsom ett erkännande av den utomordentliga förtjänst han inlagt genom upptäckten av lagarna för den kemiska dynamiken och för det osmotiska tryck

In [22]:
nobel_prizes = data.get('nobelPrizes', [])
    
df_nobel_prizes = pd.DataFrame(nobel_prizes)
print(df_nobel_prizes)
    

   awardYear                                           category  \
0       1901   {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}   
1       1901  {'en': 'Literature', 'no': 'Litteratur', 'se':...   
2       1901        {'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}   
3       1901   {'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}   
4       1901  {'en': 'Physiology or Medicine', 'no': 'Fysiol...   
5       1902   {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}   
6       1902  {'en': 'Literature', 'no': 'Litteratur', 'se':...   
7       1902        {'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}   
8       1902   {'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}   
9       1902  {'en': 'Physiology or Medicine', 'no': 'Fysiol...   
10      1903   {'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}   
11      1903  {'en': 'Literature', 'no': 'Litteratur', 'se':...   
12      1903        {'en': 'Peace', 'no': 'Fred', 'se': 'Fred'}   
13      1903   {'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'

In [23]:
df_nobel_prizes.columns

Index(['awardYear', 'category', 'categoryFullName', 'dateAwarded',
       'prizeAmount', 'prizeAmountAdjusted', 'links', 'laureates'],
      dtype='object')

# Processing the output

Process the Pandas DataFrame in order to have only the following columns:

- category
- dateAwarded (as DateTime in "yyyy-mm-dd" format)
- prizeAmount
- prizeAmountAdjusted
- Number_of_laureates
- motivation
- laureate_ids (as a list)

In [24]:


df_nobel_prizes['category'] = df_nobel_prizes.apply(lambda x: x['category']['en'], axis=1)
df_nobel_prizes['dateAwarded'] = pd.to_datetime(df_nobel_prizes['dateAwarded'])
df_nobel_prizes['Number_of_laureates'] = df_nobel_prizes['laureates'].apply(lambda x: len(x) if x else 0)
df_nobel_prizes['motivation'] = df_nobel_prizes.apply(lambda x: x['laureates'][0]['motivation']['en'] if x['Number_of_laureates'] > 0 else '', axis=1)
df_nobel_prizes['laureate_ids'] = df_nobel_prizes['laureates'].apply(lambda x: [laureate['id'] for laureate in x] if x else [])

df_refined = df_nobel_prizes[['category', 'dateAwarded', 'prizeAmount', 'prizeAmountAdjusted', 'Number_of_laureates', 'motivation', 'laureate_ids']]

print(df_refined.head())

                 category dateAwarded  prizeAmount  prizeAmountAdjusted  \
0               Chemistry  1901-11-12       150782             10531894   
1              Literature  1901-11-14       150782             10531894   
2                   Peace  1901-12-10       150782             10531894   
3                 Physics  1901-11-12       150782             10531894   
4  Physiology or Medicine  1901-10-30       150782             10531894   

   Number_of_laureates                                         motivation  \
0                    1  in recognition of the extraordinary services h...   
1                    1  in special recognition of his poetic compositi...   
2                    2  for his humanitarian efforts to help wounded s...   
3                    1  in recognition of the extraordinary services h...   
4                    1  for his work on serum therapy, especially its ...   

  laureate_ids  
0        [160]  
1        [569]  
2   [462, 463]  
3          [1]  
4

In [25]:
df_nobel_prizes.columns

Index(['awardYear', 'category', 'categoryFullName', 'dateAwarded',
       'prizeAmount', 'prizeAmountAdjusted', 'links', 'laureates',
       'Number_of_laureates', 'motivation', 'laureate_ids'],
      dtype='object')

# Getting a Pandas DataFrame with the details of awarded authors/institutions

If you dive deeper and use the API to retrieve the details of some laureate_ids, you will notice that not allways the Nobel Prize was awarded to individuals. In some cases, the awards were given to institutions.

Get the unique ids from the previous datasets and prepare the following functions:

- get_name(laureate) ( it should return the english name 'fullName' of the individual or 'orgName' of the institution )

- get_gender(laureate) ( it should return the gender or 'Unknown' for individuals, and 'None' for institutions )

- get_birthdate(laureate) ( it should return the birthdate when it's avaialble or 'Unknown' otherwise )

- get_age(laureate) ( it should return the age of the awarded individual or 'Unknown' when it's not avaialble or for institutions )

- get_city(laureate) ( it should return the english name of the city when it's available or 'Unknown' otherwise )

- get_country(laureate) ( it should return the english name of the country when it's available or 'Unknown' otherwise )

- get_continent(laureate) ( it should return the english name of the continent when it's available or 'Unknown' otherwise )

- get_latitude(laureate) ( it should return the city's latitude when it's available or 'Unknown' otherwise )

- get_longitude(laureate) ( it should return the city's longitude
 when it's available or 'Unknown' otherwise )

Create the following dictionaries:

```python
laureates_dict = {"ID": [], "Name": [], "Gender": [], \
                  "Birth_date": [], "Age": [], \
                  "City": [], "Country": [], "Continent": [], \
                  "Latitude": [], "Longitude": []}                        

functions_dict = {"ID": None, "Name": get_name, "Gender": get_gender, \
                  "Birth_date": get_birthdate, "Age": get_age, \
                  "City": get_city, "Country": get_country, "Continent": get_continent, \
                  "Latitude": get_latitude, "Longitude": get_longitude}
```

For each unique `laureate_id` of the previous DataFrame make an API call to get the details of the awarded individual/intitution and iterate of the previous dictionaries keys in order to add the corresponding information of each `laureate_id` in the empty lists of `laureates_dict`.

Finally, create a Pandas DataFrame named `laureates_df` using the `laureates_dict`.

In [27]:
from datetime import datetime

def get_name(laureate):
    if 'orgName' in laureate:
        return laureate['orgName']['en']
    return laureate['fullName']['en']

def get_gender(laureate):
    return laureate.get('gender', 'Unknown')

def get_birthdate(laureate):
    return laureate.get('birth', {}).get('date', 'Unknown')

def get_age(laureate):
    birthdate_str = get_birthdate(laureate)
    if birthdate_str == 'Unknown':
        return 'Unknown'
    try:
        birthdate = datetime.strptime(birthdate_str, '%Y-%m-%d').date()
        today = datetime.today().date()
        return today.year - birthdate.year - ((today.month, today.day) < (birthdate.month, birthdate.day))
    except ValueError:
        return 'Unknown'

def get_city(laureate):
    return laureate.get('birth', {}).get('place', {}).get('city', {}).get('en', 'Unknown')

def get_country(laureate):
    return laureate.get('birth', {}).get('place', {}).get('country', {}).get('en', 'Unknown')

def get_continent(laureate):
    return 'Unknown'

def get_latitude(laureate):
    return laureate.get('birth', {}).get('place', {}).get('location', {}).get('lat', 'Unknown')

def get_longitude(laureate):
    return laureate.get('birth', {}).get('place', {}).get('location', {}).get('lng', 'Unknown')



laureates_dict = {"ID": [], "Name": [], "Gender": [], "Birth_date": [], "Age": [],
                  "City": [], "Country": [], "Continent": [], "Latitude": [], "Longitude": []}

functions_dict = {"ID": lambda x: x.get('id', 'Unknown'), "Name": get_name, "Gender": get_gender,
                  "Birth_date": get_birthdate, "Age": get_age,
                  "City": get_city, "Country": get_country, "Continent": get_continent,
                  "Latitude": get_latitude, "Longitude": get_longitude}



In [33]:
import requests
import pandas as pd

unique_laureate_ids = [160, 569, 462, 463, 1, 293]

base_url = "http://api.nobelprize.org/2.1/laureate/"

laureates_dict = {"ID": [], "Name": [], "Gender": [], "Birth_date": [], "Age": [],
                  "City": [], "Country": [], "Continent": [], "Latitude": [], "Longitude": []}

for laureate_id in unique_laureate_ids:
    response = requests.get(f"{base_url}{laureate_id}")
    
    # Check if the API call was successful
    if response.status_code == 200:
        # Directly use the JSON response if it's structured as a list
        laureate_data = response.json()

        # Adjust here because laureate_data is directly the details
        laureate_details = laureate_data[0] if isinstance(laureate_data, list) and len(laureate_data) > 0 else None

        if laureate_details:
            # Example: mapping 'ID'
            laureates_dict['ID'].append(laureate_id)
            # Proceed with the other mappings using the function mappings defined previously
            laureates_dict['Name'].append(get_name(laureate_details))
            laureates_dict['Gender'].append(get_gender(laureate_details))
            laureates_dict['Birth_date'].append(get_birthdate(laureate_details))
            laureates_dict['Age'].append(get_age(laureate_details))
            laureates_dict['City'].append(get_city(laureate_details))
            laureates_dict['Country'].append(get_country(laureate_details))
            laureates_dict['Continent'].append(get_continent(laureate_details))
            laureates_dict['Latitude'].append(get_latitude(laureate_details))
            laureates_dict['Longitude'].append(get_longitude(laureate_details))
        else:
            print(f"No details found for laureate ID {laureate_id}")
    else:
        print(f"Failed to fetch details for laureate ID {laureate_id}")

# Create a DataFrame from the laureates_dict
laureates_df = pd.DataFrame(laureates_dict)

# Display the first few rows to verify the DataFrame content
print(laureates_df)

    ID                          Name Gender  Birth_date  Age       City  \
0  160  Jacobus Henricus van 't Hoff   male  1852-08-30  171  Rotterdam   
1  569               Sully Prudhomme   male  1839-03-16  185      Paris   
2  462             Jean Henry Dunant   male  1828-05-08  196     Geneva   
3  463                Frédéric Passy   male  1822-05-20  202      Paris   
4    1        Wilhelm Conrad Röntgen   male  1845-03-27  179     Lennep   
5  293        Emil Adolf von Behring   male  1854-03-15  170   Hansdorf   

           Country Continent Latitude Longitude  
0  the Netherlands   Unknown  Unknown   Unknown  
1           France   Unknown  Unknown   Unknown  
2      Switzerland   Unknown  Unknown   Unknown  
3           France   Unknown  Unknown   Unknown  
4          Prussia   Unknown  Unknown   Unknown  
5          Prussia   Unknown  Unknown   Unknown  


# Country ranking

Get a ranking countries by the number of times that they had been awarded in any category.

In [34]:
# Your code here
import pandas as pd

# Assuming laureates_df is your DataFrame and it contains a 'Country' field
# We will count the occurrences of each country and sort them to get the ranking

# Counting how many times each country appears (assuming each row in the DataFrame represents an award to an individual or institution)
country_awards = laureates_df['Country'].value_counts()
country_awards


Country
France             2
Prussia            2
the Netherlands    1
Switzerland        1
Name: count, dtype: int64

In [35]:
# Converting the Series to a DataFrame for better readability, and resetting index to have a clear DataFrame structure


country_awards_df = country_awards.reset_index()
country_awards_df.columns = ['Country', 'Awards']

# Sort the countries by the number of awards desc
country_awards_df = country_awards_df.sort_values('Awards', ascending=False)

# Display the ranking
print(country_awards_df)

           Country  Awards
0           France       2
1          Prussia       2
2  the Netherlands       1
3      Switzerland       1


In [36]:
print(laureates_df.head())

    ID                          Name Gender  Birth_date  Age       City  \
0  160  Jacobus Henricus van 't Hoff   male  1852-08-30  171  Rotterdam   
1  569               Sully Prudhomme   male  1839-03-16  185      Paris   
2  462             Jean Henry Dunant   male  1828-05-08  196     Geneva   
3  463                Frédéric Passy   male  1822-05-20  202      Paris   
4    1        Wilhelm Conrad Röntgen   male  1845-03-27  179     Lennep   

           Country Continent Latitude Longitude  
0  the Netherlands   Unknown  Unknown   Unknown  
1           France   Unknown  Unknown   Unknown  
2      Switzerland   Unknown  Unknown   Unknown  
3           France   Unknown  Unknown   Unknown  
4          Prussia   Unknown  Unknown   Unknown  
